Getting Started

Getting Started with OpsKern

You have a Linux host. You have Ansible. In about 30 minutes, you will have a self-healing homelab — monitoring deployed, alert rules active, and your first automated remediation running.

Prerequisites

You need the following on your control node (the machine that runs Ansible):

Linux host — Ubuntu 22.04+, Debian 12+, or Fedora 38+ (any systemd-based distro works)
Ansible — version 2.14 or newer
Python 3.10+ — for the operations agent
SSH access — key-based authentication to your managed hosts
Git — to clone the repo

On your managed hosts (the servers Ansible will configure):

Linux — same distro requirements as above
SSH server — running and reachable from the control node
sudo access — for the Ansible user

Optional but recommended:

Tailscale — for secure, zero-config VPN between hosts
A second host — to see fleet-wide monitoring in action (a VM or LXC container works fine)

Step 1: Clone the repo

git clone https://github.com/opskern/ops-kernel-stack.git
cd ops-kernel-stack

Step 2: Configure your inventory

Copy the example inventory and add your hosts:

cp inventory/example.yml inventory/hosts.yml

Edit inventory/hosts.yml with your hostnames and IP addresses. At minimum, you need one host under the monitoring group.

Step 3: Deploy monitoring

This single command deploys Prometheus, Grafana, Loki, and node_exporter across your fleet:

ansible-playbook playbooks/site.yml -l monitoring

After this completes, open Grafana at http://<monitoring-host>:3000. Default credentials are in the README.

Step 4: Add your first alert rule

The collection ships with 108 alert rules. To deploy them:

ansible-playbook playbooks/alerting.yml

Alertmanager will start routing alerts based on the default configuration. Edit group_vars/all/alertmanager.yml to point notifications at your preferred channel (email, Slack, ntfy, etc.).

Step 5: Run your first remediation

The simplest remediation to test is disk cleanup. Trigger it manually to see the pipeline in action:

ansible-playbook playbooks/remediate-disk-cleanup.yml -l <your-host>

To enable automated remediation (Alertmanager triggers playbooks via the operations agent), follow the operations agent setup in the repo README.

What’s next

Ansible Homelab Cheat Sheet — 20 commands and patterns on one page
The book — 12 chapters covering the full stack, from Proxmox provisioning to the operations agent
ops-kernel-stack on GitHub — the full source, issues, and discussions
Blog — deep dives on specific patterns and architecture decisions

Questions? Email hello@opskern.io or open an issue on GitHub.