Getting Started

Getting Started with OpsKern

You have a Linux host. You have Ansible. In about 30 minutes, you will have a self-healing homelab — monitoring deployed, alert rules active, and your first automated remediation running.


Prerequisites

You need the following on your control node (the machine that runs Ansible):

  • Linux host — Ubuntu 22.04+, Debian 12+, or Fedora 38+ (any systemd-based distro works)
  • Ansible — version 2.14 or newer
  • Python 3.10+ — for the operations agent
  • SSH access — key-based authentication to your managed hosts
  • Git — to clone the repo

On your managed hosts (the servers Ansible will configure):

  • Linux — same distro requirements as above
  • SSH server — running and reachable from the control node
  • sudo access — for the Ansible user

Optional but recommended:

  • Tailscale — for secure, zero-config VPN between hosts
  • A second host — to see fleet-wide monitoring in action (a VM or LXC container works fine)

Step 1: Clone the repo

git clone https://github.com/opskern/ops-kernel-stack.git
cd ops-kernel-stack

Step 2: Configure your inventory

Copy the example inventory and add your hosts:

cp inventory/example.yml inventory/hosts.yml

Edit inventory/hosts.yml with your hostnames and IP addresses. At minimum, you need one host under the monitoring group.

Step 3: Deploy monitoring

This single command deploys Prometheus, Grafana, Loki, and node_exporter across your fleet:

ansible-playbook playbooks/site.yml -l monitoring

After this completes, open Grafana at http://<monitoring-host>:3000. Default credentials are in the README.

Step 4: Add your first alert rule

The collection ships with 108 alert rules. To deploy them:

ansible-playbook playbooks/alerting.yml

Alertmanager will start routing alerts based on the default configuration. Edit group_vars/all/alertmanager.yml to point notifications at your preferred channel (email, Slack, ntfy, etc.).

Step 5: Run your first remediation

The simplest remediation to test is disk cleanup. Trigger it manually to see the pipeline in action:

ansible-playbook playbooks/remediate-disk-cleanup.yml -l <your-host>

To enable automated remediation (Alertmanager triggers playbooks via the operations agent), follow the operations agent setup in the repo README.


What’s next

Questions? Email hello@opskern.io or open an issue on GitHub.