Treating OpenClaw Like a Junior Sysadmin

I wanted to kick the tires on OpenClaw, but I did not want to install it directly on my primary workstation.

That's not a knock on OpenClaw specifically. It is just the normal posture I try to keep with new automation tools, especially ones that are designed to sit close to my workflow, read context, talk to services, and take actions on my behalf. Before I hand something the keys to my actual daily-driver environment, I want to see how it behaves in a smaller, more boring box.

So the question became: how do I evaluate an AI companion without turning it into a trusted endpoint?

The answer I settled on was to treat it like a junior sysadmin.

Give it an identity. Give it a "workstation". Give it limited access to logs, metrics, chat, and a couple of source-control systems. Let it observe. Let it help troubleshoot. Do not give it my shell, my browser profile, my SSH agent, or a wide-open view of the homelab.

The First Attempt

The first pass was a small VM on one of my internal servers: guiltyspark. That made sense at the time: isolated guest, disposable disk, a narrow network path, and a firewall policy that only allowed what I explicitly needed. The VM was built around a Debian guest with OpenClaw bootstrapped inside it, and the network policy was intentionally grumpy about internal access. Matrix was allowed. The rest of RFC1918 was mostly not.

# This was the original guiltyspark VM shape.
# The bridge could talk to itself, and Matrix was allowed.
# Everything else in RFC1918 got rejected before the final allow.

${iptables} -I FORWARD 1 \
  -i ${openclaw.network.bridge} \
  -o ${openclaw.network.bridge} \
  -j ACCEPT

${iptables} -I FORWARD 2 \
  -i ${openclaw.network.bridge} \
  -d ${openclaw.matrix.ip}/32 \
  -p tcp -m multiport --dports 80,443 \
  -j ACCEPT

${iptables} -I FORWARD 3 -i ${openclaw.network.bridge} -d 10.0.0.0/8 -j REJECT
${iptables} -I FORWARD 4 -i ${openclaw.network.bridge} -d 172.16.0.0/12 -j REJECT
${iptables} -I FORWARD 5 -i ${openclaw.network.bridge} -d 192.168.0.0/16 -j REJECT

# After the explicit internal rejects, normal outbound traffic was okay.
${iptables} -I FORWARD 6 -i ${openclaw.network.bridge} -j ACCEPT

                   allowed: tcp/80,443
+---------------+ ------------------------> +-------------------+
| openclaw VM   |                           | matrix.badger.lan |
| Debian guest  |                           | Synapse           |
+---------------+                           +-------------------+
        |
        | blocked: 10/8, 172.16/12, 192.168/16
        v
+---------------------------------------------------------------+
| the rest of the internal network                              |
+---------------------------------------------------------------+

That worked well as a security model, but it did not work well as an evaluation environment. The VM (1 core, 4gb ram) felt cramped, and I wanted more horsepower without moving the experiment onto my primary workstation.

I had a spare laptop sitting around and deployed NixOS, then as a Halo nerd, named it cortana. Good enough. It had actual hardware, it was not my main machine, and it was now part of the NixOS fleet. So Cortana became the OpenClaw host.

Cortana Becomes the Intern Desk

The first step was, obviously, infrastructure work - Cortana got a real DNS name, cortana.badger.lan, and a static management address from the internal pool.

# common/system/common.nix
# Keep hostnames and management addresses in one shared inventory.

hostnames.cortana = "cortana.badger.lan";

hosts.cortana = {
  mgmt = "10.170.0.117"; # get it? 117? ok...
};

# modules/infrastructure/coredns.nix
# CoreDNS then renders the LAN A record from that inventory.

cortana IN A ${infraCommon.hosts.cortana.mgmt}

# hosts/cortana/networking.nix
# The host gets the same address statically, instead of relying on DHCP luck.

addresses = "${common.hosts.cortana.mgmt}/24";

Secrets are handled through SOPS. The rendered OpenClaw configuration lands in /home/cortana/.openclaw/openclaw.json with the Matrix credentials and gateway token injected by the system configuration. No room IDs, passwords, or tokens are baked into the repo.

# hosts/cortana/openclaw.nix
# These secret names exist in Git, but the values live encrypted in SOPS.

sops.secrets = {
  "openclaw/matrix-user-id" = { };
  "openclaw/matrix-password" = { };
  "openclaw/matrix-room-id" = { };
  "openclaw/gateway-token" = { };
};

# Render the runtime config as the cortana user.
# The actual token/password placeholders are replaced by sops-nix at activation time.

sops.templates."openclaw.json" = {
  owner = "cortana";
  group = "users";
  mode = "0400";
};

The OpenClaw gateway itself is not running as a user systemd service. That was one of the early pain points. On NixOS, the installer expected a more conventional Linux environment, then we ran into user-bus and package-manager assumptions. The quickstart wanted Node. The Node installer wanted a package manager it recognized. The gateway service wanted systemd user-bus behavior that was not present in the way I was invoking it. Then rootless container namespace setup had its own newuidmap complaints after a reboot.

None of those problems were individually shocking. They were just enough friction to make the native NixOS path feel like the wrong thing to evaluate first.

This is primarily because I chose to deploy inside of Distrobox. Cortana is still NixOS, but OpenClaw runs inside a Fedora Distrobox named openclaw because... it's obvious?. The NixOS module owns the outer lifecycle, creates the box if needed, installs the normal Fedora-side dependencies, runs the OpenClaw installer there, and starts the gateway under a system service on the host.

# Host: NixOS
# Container userland: Fedora via Distrobox
# OpenClaw sees a managable Linux host it expects; NixOS still owns the service lifecycle.

boxName = "openclaw";
boxImage = "quay.io/fedora/fedora:latest";

if ! podman container exists ${boxName}; then
  distrobox-create \
    --yes \
    --name ${boxName} \
    --image ${boxImage} \
    --additional-packages "nodejs npm make gcc gcc-c++ cmake python3 chromium git curl tar gzip xz which procps-ng diffutils findutils"
fi

distrobox-enter --name ${boxName} -- bash -lc '
  set -euo pipefail

  # Keep npm global installs under the cortana home directory.
  mkdir -p ${npmPrefix}
  npm config set prefix ${npmPrefix}
  export PATH="${npmPrefix}/bin:/usr/local/bin:/usr/bin:/bin:$PATH"

  # Let OpenClaw install itself in the Fedora userland.
  if ! openclaw --version >/dev/null 2>&1; then
    curl -fsSL https://openclaw.ai/install.sh | bash
  fi
'

+-------------------------------------------------------------+
| cortana.badger.lan                                          |
| NixOS                                                       |
|                                                             |
|  systemd: openclaw-gateway.service                          |
|      |                                                      |
|      v                                                      |
|  distrobox-enter openclaw                                   |
|      |                                                      |
|      v                                                      |
|  Fedora userland: node, npm, chromium, openclaw             |
|                                                             |
|  gateway bind: 127.0.0.1:18789                              |
+-------------------------------------------------------------+

# The host service is intentionally simple.
# NixOS owns start/stop/restart. OpenClaw runs as cortana.

systemd.services.openclaw-gateway = {
  description = "OpenClaw Gateway (Distrobox)";
  wantedBy = [ "multi-user.target" ];
  wants = [ "network-online.target" ];
  after = [ "network-online.target" ];

  serviceConfig = {
    User = "cortana";
    Group = "users";
    WorkingDirectory = "/home/cortana";
    Type = "simple";
    ExecStartPre = openclawBootstrap;
    ExecStart = openclawGatewayStart;
    ExecStop = openclawGatewayStop;
    Restart = "always";
    RestartSec = "10s";
  };
};

The gateway binds to loopback on port 18789. If I want the dashboard, I SSH tunnel to Cortana and open it locally (for now - eventually it may join the rest of my internal services on Homepage. OpenClaw also gets a host-side wrapper, so from Cortana I can run openclaw ... and have it enter the Distrobox with the right PATH and internal CA trust.

# Local workstation
# Nothing exposed on the LAN; the dashboard is reached through SSH.

ssh -N -L 18789:127.0.0.1:18789 cortana@cortana.badger.lan

# Then open this locally:
# http://localhost:18789/

Identity, Not My Identity

The important bit for me was that OpenClaw should not be me.

I created separate identities for it: a GitHub profile and a Forgejo profile for my internal git service. Those accounts are intentionally scoped as service-style users. They can be invited to the things I want them to see, and they can be removed or rotated without touching my personal credentials.

# Conceptually, this is the access model I wanted:

human:badgerops
  - primary workstation
  - normal SSH agent
  - broad repo/admin access
  - browser sessions and personal tokens

assistant:openclaw
  - host: cortana
  - Unix user: cortana
  - GitHub user: <wouldn't you like to know...>
  - Forgejo user: cortana-bot
  - Matrix user: cortana
  - access: selected rooms, selected repos, selected logs/metrics

That distinction matters. If this is supposed to act like a junior sysadmin, it should have an account like a junior sysadmin. Not my browser cookies. Not my workstation SSH agent. Not my github or forgejo or ${service} token because that is easiest. A little extra work = a little less future me pain. Maybe.

Right now the goal is not to let it autonomously administer everything. The goal is to let it participate in the troubleshooting loop with enough context to be useful.

Matrix As The Control Plane

The chat side is Matrix, because that is already where my internal alerting and operational chat lives.

OpenClaw is configured against my internally hosted Synapse instance at matrix.badger.lan. The Matrix plugin is enabled, encrypted rooms are enabled, and direct messages use pairing. Group access is allowlisted, not open. That means the bot does not get to roam through arbitrary rooms just because it exists on the Matrix homeserver.

# Rendered shape of the Matrix channel config.
# Values shown as placeholders are injected from SOPS.

"channels": {
  "matrix": {
    "enabled": true,
    "homeserver": "https://matrix.badger.lan",
    "network": {
      "dangerouslyAllowPrivateNetwork": true # it's _on_ a private network
    },
    "userId": "${openclaw/matrix-user-id}",
    "password": "${openclaw/matrix-password}",
    "deviceName": "OpenClaw Gateway",
    "encryption": true,
    "dm": {
      "policy": "pairing"
    },
    "groupPolicy": "allowlist",
    "groups": {
      "${openclaw/matrix-room-id}": {
        "enabled": true
      }
    },
    "autoJoin": "allowlist",
    "autoJoinAllowlist": [
      "${openclaw/matrix-room-id}"
    ]
  }
}

For the first real room, I added it to the internal infra room. That is also where the Alertmanager-to-Matrix relay can post alerts, so OpenClaw can see the same operational noise I would normally react to: service health, host issues, OpenClaw's own gateway state, and the other homelab monitoring signals.

+----------------+       webhook        +--------------------------+
| Prometheus     | ------------------>  | Alertmanager             |
| alert rules    |                      |                          |
+----------------+                      +------------+-------------+
                                                   |
                                                   | Matrix relay
                                                   v
                                        +--------------------------+
                                        | #infra Matrix room       |
                                        | humans + OpenClaw        |
                                        +--------------------------+

There was a small gotcha here: pairing requests can time out. I spent a bit staring at the CLI saying there were no pending Matrix pairings while the chat UI still showed one. Eventually I realized: it had expired. Start over, pair again, move on.

Observability For The Assistant

If I am going to run a helper that is supposed to help with operations, I need to be able to observe the helper too.

The Cortana module exports OpenClaw health into the Node Exporter textfile collector once per minute. It runs openclaw health --json as the cortana user and converts the output into Prometheus metrics. That gives me scrape success, gateway health, event loop delay, event loop utilization, agent sessions, and Matrix channel state.

# Timer: refresh the OpenClaw textfile metrics every minute.

systemd.timers.openclaw-node-exporter-textfile = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnBootSec = "2m";
    OnUnitActiveSec = "1m";
    Unit = "openclaw-node-exporter-textfile.service";
  };
};

# Collector: ask OpenClaw for health JSON as the cortana user.

health_json="$(${pkgs.util-linux}/bin/runuser -u cortana -- ${openclawWrapper}/bin/openclaw health --json 2>/dev/null)"

# Example output written to node_exporter's textfile collector.
# Prometheus scrapes this like any other node metric.

openclaw_scrape_success 1
openclaw_gateway_health_ok 1
openclaw_event_loop_delay_p99_milliseconds 12
openclaw_event_loop_utilization 0.02
openclaw_channel_connected{channel="matrix",account="default"} 1

Prometheus scrapes Cortana's node exporter with role=openclaw. Alert rules watch for stale OpenClaw metrics, failed health scrapes, an unhealthy gateway, and disconnected channels. There is also a Grafana dashboard for the gateway, because eventually every little homelab experiment needs a dashboard. Apparently this is the law.

# Prometheus target. Cortana is just another node_exporter scrape,
# but it gets role=openclaw so dashboards and alerts can filter cleanly.

- targets: ['${globalCommon.hosts.cortana.mgmt}:9100']
  labels:
    host: cortana
    role: openclaw

# A few of the OpenClaw-specific alerts.
# These are less about paging me immediately and more about proving the assistant itself is observable.

- alert: OpenClawMetricsStale
  expr: (time() - openclaw_scrape_timestamp_seconds{job="node-exporter",host="cortana"}) > 300
  for: 5m
  labels:
    severity: warning
    category: openclaw

- alert: OpenClawGatewayUnhealthy
  expr: openclaw_gateway_health_ok{job="node-exporter",host="cortana"} == 0
  for: 5m
  labels:
    severity: critical
    category: openclaw

- alert: OpenClawChannelDisconnected
  expr: openclaw_channel_connected{job="node-exporter",host="cortana"} == 0
  for: 5m
  labels:
    severity: warning
    category: openclaw

Logs go through Vector into Loki. OpenClaw writes JSON-ish logs under the Cortana user's home and temporary runtime paths, Vector normalizes the useful fields, and Loki labels them with host, service, unit, source, and level. One minor tradeoff: Vector runs as root on Cortana (for other host log reasons). I am not thrilled by that, but it is contained to this host and may be worth revisiting later.

# Vector reads both the temporary OpenClaw logs and the durable user logs.

sources.openclaw_files = {
  type = "file";
  include = [
    "/tmp/openclaw/openclaw-*.log"
    "/home/cortana/.openclaw/logs/*.jsonl"
  ];
  read_from = "end";
};

# Normalize the labels before sending to Loki.
# This makes queries like {service="openclaw",host="cortana"} work cleanly.

sinks.openclaw_loki = {
  type = "loki";
  endpoint = "http://loki.badger.lan";
  labels = {
    host = "{{ host }}";
    service = "{{ service }}";
    source = "{{ source_type }}";
    unit = "{{ unit }}";
    level = "{{ level }}";
  };
};

What Worked

The model feels right so far.

OpenClaw is not sitting on my workstation. It does not inherit my local trust. It has its own host, its own account, its own chat identity, and its own source-control identities. It can see selected operational context, especially Matrix alerts and the logs/metrics I choose to expose.

Distrobox ended up being the right compromise for this stage. I still get a declarative NixOS host managing the lifecycle, DNS, users, SOPS secrets, systemd service, Prometheus integration, Vector integration, and Grafana provisioning. OpenClaw gets a normal Fedora-ish userland where its installer and Node assumptions are much less weird.

I’ve also been using Cortana as a lightweight engineering assistant around my GitHub and Forgejo work. The first pattern so far is automated clawpatch review: she picks an active repo, runs a read-only code review pass, turns the findings into a Markdown report, commits that report into a dedicated archive repo, and sends me the summary in Matrix.

cron schedule
  -> choose an active (commits in last year) BadgerOps repo
  -> run clawpatch review
  -> save Markdown report
  -> commit report to forgejo:cortanabot/clawpatch-reports
  -> send summary to Matrix

The important part is that this is review-only automation. It does not push branches, open PRs, or publish changes unless I explicitly ask. That keeps the loop useful without letting automation mutate production code behind my back.

What Still Needs Cleanup

The Matrix room configuration started as a single room and the current Nix template still renders a single SOPS-backed room ID into groups and autoJoinAllowlist. I have a multi-room secret shape ready, but the renderer still needs to consume that JSON before I can honestly say the room allowlist is multi-room in the deployed config.

# Current: one SOPS-backed room ID rendered into the allowlist.

"groups": {
  "${openclaw/matrix-room-id}": {
    "enabled": true
  }
}

# Next: render a SOPS-backed JSON object/list for multiple rooms.
# The secret exists, but the Nix template still needs to consume it.

openclaw/matrix-room-ids-json

I also want to tighten the source-control permissions further once I know which workflows are useful. The right shape is probably a tiny set of repos, branch-only write permissions where possible, and no credential that would be painful to rotate.

The next useful test is not whether OpenClaw can answer trivia in chat. I do not care much about that. The useful test is whether it can look at an alert, inspect the narrow set of logs and metrics I exposed, find the likely failing unit or regression, and propose a fix that I can review like any other pull request, same with codebase changes as I continue to work on projects.

The workflow I want to prove:


1. Alertmanager posts an alert into Matrix
2. OpenClaw sees the alert in an allowlisted room
3. OpenClaw checks only the logs/metrics it has access to
4. OpenClaw proposes a diagnosis and patch
5. Human reviews the change like any other PR

The Takeaway

I think this is the pattern I want for AI companion tools in my own infrastructure: do not install them into the most trusted place first. Give them a small desk, a badge with their own name on it, limited read access, and a very boring job.

In this case, that desk is Cortana. The badge is a Matrix account plus separate GitHub and Forgejo users. The boring job is watching alerts, reading selected logs and metrics, and helping me reason through operational issues.

That feels a lot better than installing a brand-new assistant directly onto my workstation and hoping the defaults line up with my threat model.

We'll see where things end up next!

All for now,

-BadgerOps