Security & Trust

Trust Pillars

AI Safety Controls

Command blocklist prevents destructive operations — rm -rf, DROP TABLE, format never execute
Graduated autonomy: routine fixes auto-execute, high-risk changes require your approval
Every AI decision logged with reasoning, risk classification, and outcome

Secrets Protection

Pre-commit scanning rejects credentials and API keys before they reach Git
No secrets in code — Ansible Vault (AES-256), environment-isolated
Client data never leaves your infrastructure

Full Audit Trail

Every remediation logged: timestamp, host, action, result
Mean Time to Resolution tracked per incident type
Exportable incident history for compliance reporting

Automated Compliance

CIS benchmark scanning across your fleet
Continuous vulnerability assessment via Trivy CVE scanning
Configuration drift detection — unauthorized changes caught in real-time

SLA Monitoring

Real-time uptime tracking with sub-minute granularity
30-day rolling SLA dashboard, visible to you
Backup freshness monitoring with automated alerts at 07:00 UTC daily

Self-Healing Pipeline

Detect, classify, remediate, verify, notify — 60 seconds end to end
Cooldown enforcement prevents remediation storms
Failed fixes escalate to AI investigation, then human approval

How It Works

The remediation loop

Detect Prometheus catches the anomaly in under 15 seconds

Classify Operations agent assigns risk tier and selects playbook

Remediate Ansible executes the fix — bounded, logged, reversible

Verify Post-remediation health check confirms resolution

Notify You get the result: what broke, what ran, what changed

The Difference

Most MSPs react to tickets. OpsKern prevents them.

Traditional MSP

Hours to detect an outage
Manual ticket creation
Human investigates, then fixes
You find out after the fact
Incident report in days

OpsKern

15 seconds to detect
Auto-classified, auto-dispatched
Known fixes execute in under 60 seconds
You get notified as it resolves
Audit log available immediately

Transparency

What we have. What we're building toward.

We don’t have SOC2 yet. We’re a small operation building toward it. Here’s what we do have today:

Network security — All management traffic runs over Tailscale’s WireGuard mesh. No management ports on the public internet. Zero open inbound ports for our access.

Host hardening — SSH key-only authentication, fail2ban, automatic security patches, least-privilege containers. Every managed host, every tier.

Encrypted backups — AES-256 encryption via Restic before data leaves the host. SFTP for transit. Verified daily.

Infrastructure as code — Every configuration lives in Git. No manual changes to production. Full audit trail of every change. If a host dies, we rebuild it from the repo.

Access isolation — Per-client SSH keys. Per-client environments. Credentials in Ansible Vault. Your infrastructure is never shared with another client.

The self-healing pipeline is continuously tested against real infrastructure scenarios across 6 severity tiers. The Ansible collection that powers all of it is open source — inspect it yourself at github.com/opskern/ops-kernel-stack.

Security-First Infrastructure Management