ralph(1)

An AI Inventory Agent for manufacturers, built as a sovereign Hermes agent. The agent can ask for anything; only the broker can act.

Name

ralph — an AI Inventory Agent for manufacturers. It plugs into the company's inventory system, sales/orders database and invoicing, spots a shortage before it stops the line, and reorders and pays the supplier itself — entirely on the company's own hardware, and by architecture it cannot misuse the company card: it never holds it.

Under the hood it's a sovereign Hermes agent in the full sense — an autonomous loop that:

reasons on NVIDIA Nemotron 120B locally by default — Nous's Hermes 3 (8B) selectable and powering its self-improvement — and switches to NVIDIA Nemotron 3 Ultra for the hardest calls;
self-improves — turns each run's mistakes into policy changes it applies (and queues the rest);
pays for what it runs on — with Stripe Skills the broker covers materials and, in the demo, the services a shop needs (logistics SaaS, freight labels, cloud compute);
runs safely inside the NemoClaw sandbox, with a broker wall it cannot cross.

The foundation, used whole: a Hermes agent · reasoning on NVIDIA Nemotron (120B local + 3 Ultra cloud), Hermes 3 selectable · running safely in NemoClaw · paying through Stripe Skills. Built on the contest's actual primitives, not near them.

Synopsis

# the wall (start first) · only holder of the Stripe key
broker.py
# the read-only operator view at http://<host>:8090
dashboard.py
# the brain: earn → spend, reasoning on a local model
agent.py
# prove the wall holds — three rogue requests, all blocked
demo_refuse.py

Description

Every "AI agent that spends money" demo has the same two problems: you hand a language model your credit card and your cloud. ralph refuses both.

The reasoning runs on a local model on a box in the building. Prices, customers and margins never leave.
The agent never holds a payment credential. A separate broker process is the only thing that can move money, and only against a hard rulebook.
Cloud is opt-in, per decision: switch to a bigger NVIDIA model for one hard call, then back to private.

The one property that matters: the agent and broker are separate OS processes, and the only channel between them runs through a locked-down sandbox. Jailbreaking the agent gets you nothing — it has no key and no network path to Stripe.

Architecture

          ┌─────────────────────────────────────────────────┐
          │  ralph host   (on-prem DGX Spark · GB10)         │
          │                                                 │
 orders ─►│   agent.py            broker.py                │
          │   "the brain"          "the wall"               │
          │   • reasons local      • ONLY holds the         │
          │   • holds NO key         Stripe key             │──► Stripe
          │   • can only ASK       • enforces the rulebook  │    (test mode)
          │        │                     ▲                  │    real PaymentIntents
          │        │ request             │ receipt          │
          │        ▼                     │                  │
          │   ┌──────────────────────────────────┐          │
          │   │  NemoClaw / OpenShell sandbox     │          │
          │   │  no internet · no interpreters    │          │
          │   └──────────────────────────────────┘          │
          │                                                 │
          │   hermes.py ─► learns from issues, retunes policy│
          │   dashboard.py ─► read-only live view :8090      │
          └─────────────────────────────────────────────────┘

File	Role
agent.py	The brain. Reads orders + inventory, decides what to sell and restock. Holds no key — submits requests, waits for receipts.
broker.py	The wall. The only process with the Stripe key. Validates every request against the rulebook, pays, writes receipts, logs issues.
sandbox_channel.py	The agent↔broker boundary. Drops requests / reads receipts inside the NemoClaw container via `docker exec`.
actions.py	Cash ledger + the real Stripe calls. Refuses to start on a live key.
hermes.py	The continuous-improvement loop — turns the run's issues into policy changes it applies (reorder bump + learned pre-filter), and queues the rest.
audit.py	The SHA-256 hash chain. Every consequential action, independently verifiable.
dashboard.py	Read-only operator view on :8090. The only write it accepts is the brain selector — never money.

The broker wall

The broker is the single most important component: the only process that holds the Stripe key and the only one with a path to Stripe. The agent cannot pay. It can only drop a request into the sandbox and wait.

When a request arrives, the broker checks it against a hard rulebook before a cent moves:

Known catalog — the SKU / product must exist. Off-catalog asks die here.
Broker-authoritative price — the agent does not get to name the price; the broker does.
Quantity cap — a per-order ceiling (500 units) stops absurd orders.
Budget guard — it never auto-pays what it can't afford: it refuses to overspend, and an oversized-but-legitimate order routes to a human (below) instead.
Actually-low check — restocks only go through for materials genuinely below their reorder point.

Pass all checks → a real Stripe test PaymentIntent and a receipt. Fail any → a refusal with its reason, written to the audit trail. Zero dollars move on a failed check.

Brains · private ⇄ cloud

Agent vs. model — the one thing to be precise about. ralph (the Hermes agent) is the harness: the autonomous loop, the self-improvement, the skills, the broker wall. The toggle below does not change that — it only swaps the model the agent reasons on. Same employee; you choose the mind.

ralph reasons privately by default — nothing leaves the building. One switch on the dashboard lets it move a genuinely hard call to NVIDIA's biggest model, then back. Privacy or horsepower, per decision.

Toggle	Model it reasons on	Where	When
local (default)	Nemotron-3 Super 120B	on the Spark	ralph's default brain — routine calls, data never leaves the box
hermes (native)	Hermes 3 (8B)	on the Spark	Nous's model — selectable, and the mind its self-improvement loop reasons on
cloud	Nemotron 3 Ultra	NVIDIA API	the hardest calls — only when you select it and a key is present

Mechanics: the toggle POSTs to /api/brain, which writes brain.json. agent.py reads it at decision time and routes to local Ollama or NVIDIA's OpenAI-compatible endpoint. The toggle selects a model name only — it can never touch money or a key. Cloud degrades gracefully: selected without a key, it stays local and says so.

Stripe Skills — it pays for the services it runs on

An autonomous business doesn't only buy raw materials; it pays for the services it runs on. Beyond materials, the broker can pay for services through Stripe Skills — in the demo, a logistics SaaS, a per-shipment freight label, and the cloud compute used when it switches to Nemotron 3 Ultra — each a real Stripe charge, each behind the same broker wall. (The live agent loop drives sales and restocks; the service payments are exercised by the film.)

The closed loop across all three sponsors: ralph reasons on NVIDIA's model, runs it safely inside NemoClaw, and uses Stripe to pay NVIDIA for the very inference it used. The agent can request a service; only the broker can pay.

Safety & refusals

Four independent layers stand between a misbehaving model and a real charge:

No key in the agent. The agent process never holds the Stripe credential.
The broker wall. A separate sole money-mover enforces the rulebook above.
Sandbox boundary. Every request crosses the NemoClaw container — no internet, no interpreters — so the agent has no payment code path to Stripe.
Test-mode only. actions.py refuses to start on a live sk_live_… key. No real card is ever charged.

demo_refuse.py proves it on camera — three rogue requests injected straight into the sandbox:

BLOCKED  off-catalog       buy 10x TITANIUM-INGOT      → unknown SKU, not on catalog
BLOCKED  hard budget       panic-buy 500x sheet (~$46k) → exceeds cash on hand
BLOCKED  sane-quantity     order 99,999 pails fluid    → above the 500/order cap

Human in the loop

Safety isn't only blocking bad actions — it's having a path for the ambiguous ones. A legitimate but oversized spend (e.g. a $23,000 bulk reorder that exceeds cash on hand) is blocked by the broker rather than silently failing. On top of that, the dashboard adds a Human Approvals surface (approvals.json, POST /api/approve) where a person clicks Approve or Deny and the decision is audited — the control surface that makes an autonomous spender deployable. (In the live loop the broker simply blocks the overspend; the approvals queue is demonstrated in the film.)

Audit trail

Every consequential action the broker takes — a sale, a purchase, a service payment, a refusal, a human approve/deny — is appended to a SHA-256 hash chain (audit.py). Each entry's hash covers the previous entry's hash:

entry[n].hash = sha256( entry[n].data + entry[n-1].hash )

So the log can be independently recomputed and nothing can be altered or reordered without detection: change one row and every row after it fails verification. The dashboard shows a live ✓ verified badge over the head hash. This is the compliance record an enterprise needs before letting any agent move money.

Fleet / scale

ralph isn't one agent — it's a pattern. The same brain + broker + sandbox that runs one shop floor runs many, each sovereign on its own box, rolling up into one automated company a single operator watches (fleet.py). The dashboard's Fleet panel shows the per-site net and the aggregate.

Honesty: the Hamilton plant is the live end-to-end demo; the other sites show the identical pattern deployed. The multi-site rollup is shown, not all six run live in the demo. See Real vs. demonstrated.

Command reference

# reset to a clean rehearsal state
$PY -c "import inventory,products,sandbox_channel as c,hermes; \
        inventory.reset(); products.reset(); c.reset(); hermes.reset()"
rm -f state.json brain.json learnings.json issues.json

# the live processes (separate terminals)
$PY -u broker.py        # the wall (start first)
$PY -u dashboard.py     # read-only dashboard, http://<host>:8090
$PY -u agent.py         # the earn → spend run
$PY -u demo_refuse.py   # the three REFUSE beats
$PY -u hermes.py        # turn issues into applied upgrades

# regenerate the demo assets (headless, no desktop)
./make_film.sh  <out>    # self-narrating captioned film
./make_broll.sh <out>    # clean dashboard B-roll

Everything is stdlib — the dashboard is Python http.server; the agent/broker use urllib. No heavy framework. Small, auditable, runs anywhere on the box.

Real vs. demonstrated

Straight with the judges about what runs live versus what's paced for camera:

Real: the broker/agent/sandbox split; the Stripe charges (test mode) for sales and materials; the budget + rulebook refusals; the SHA-256 audit chain and its verification; the human approve/deny endpoint; Hermes actually changing reorder policy + installing its learned pre-filter; the local-model decisions.
Paced for camera / demonstrated: the demo film re-enacts the real numbers and Stripe references on a timed cadence so it doesn't wait on model latency. The service payments (logistics SaaS, freight, compute) are real Stripe charges the broker can make, but they're exercised by the film, not requested by the live agent loop — same for the human-approval queue. The genuine autonomous loop is broker.py + agent.py + demo_refuse.py + hermes.py, runnable live.
Illustrative: the fleet view shows the live Hamilton plant plus five sister sites representing the same pattern deployed across locations.
Opt-in: the cloud brain (and its compute charge) requires an NVIDIA_API_KEY; without it ralph stays fully local.