Security-First Infrastructure Management

Every action audited. Every automation bounded. Your infrastructure, under control.


!!

AI Safety Controls

  • Command blocklist prevents destructive operations — rm -rf, DROP TABLE, format never execute
  • Graduated autonomy: routine fixes auto-execute, high-risk changes require your approval
  • Every AI decision logged with reasoning, risk classification, and outcome
**

Secrets Protection

  • Pre-commit scanning rejects credentials and API keys before they reach Git
  • No secrets in code — Ansible Vault (AES-256), environment-isolated
  • Client data never leaves your infrastructure
>>

Full Audit Trail

  • Every remediation logged: timestamp, host, action, result
  • Mean Time to Resolution tracked per incident type
  • Exportable incident history for compliance reporting
##

Automated Compliance

  • CIS benchmark scanning across your fleet
  • Continuous vulnerability assessment via Trivy CVE scanning
  • Configuration drift detection — unauthorized changes caught in real-time
//

SLA Monitoring

  • Real-time uptime tracking with sub-minute granularity
  • 30-day rolling SLA dashboard, visible to you
  • Backup freshness monitoring with automated alerts at 07:00 UTC daily
~>

Self-Healing Pipeline

  • Detect, classify, remediate, verify, notify — 60 seconds end to end
  • Cooldown enforcement prevents remediation storms
  • Failed fixes escalate to AI investigation, then human approval

The remediation loop

1
Detect Prometheus catches the anomaly in under 15 seconds
2
Classify Operations agent assigns risk tier and selects playbook
3
Remediate Ansible executes the fix — bounded, logged, reversible
4
Verify Post-remediation health check confirms resolution
5
Notify You get the result: what broke, what ran, what changed

Most MSPs react to tickets. OpsKern prevents them.

Traditional MSP

  • Hours to detect an outage
  • Manual ticket creation
  • Human investigates, then fixes
  • You find out after the fact
  • Incident report in days

OpsKern

  • 15 seconds to detect
  • Auto-classified, auto-dispatched
  • Known fixes execute in under 60 seconds
  • You get notified as it resolves
  • Audit log available immediately

What we have. What we're building toward.

We don’t have SOC2 yet. We’re a small operation building toward it. Here’s what we do have today:

Network security — All management traffic runs over Tailscale’s WireGuard mesh. No management ports on the public internet. Zero open inbound ports for our access.

Host hardening — SSH key-only authentication, fail2ban, automatic security patches, least-privilege containers. Every managed host, every tier.

Encrypted backups — AES-256 encryption via Restic before data leaves the host. SFTP for transit. Verified daily.

Infrastructure as code — Every configuration lives in Git. No manual changes to production. Full audit trail of every change. If a host dies, we rebuild it from the repo.

Access isolation — Per-client SSH keys. Per-client environments. Credentials in Ansible Vault. Your infrastructure is never shared with another client.

The self-healing pipeline is continuously tested against real infrastructure scenarios across 6 severity tiers. The Ansible collection that powers all of it is open source — inspect it yourself at github.com/opskern/ops-kernel-stack.