ralph(1)
An AI Inventory Agent for manufacturers, built as a sovereign Hermes agent. The agent can ask for anything; only the broker can act.
Name
ralph — an AI Inventory Agent for manufacturers. It plugs into the company's inventory system, sales/orders database and invoicing, spots a shortage before it stops the line, and reorders and pays the supplier itself — entirely on the company's own hardware, and by architecture it cannot misuse the company card: it never holds it.
Under the hood it's a sovereign Hermes agent in the full sense — an autonomous loop that:
- reasons on NVIDIA Nemotron 120B locally by default — Nous's Hermes 3 (8B) selectable and powering its self-improvement — and switches to NVIDIA Nemotron 3 Ultra for the hardest calls;
- self-improves — turns each run's mistakes into policy changes it applies (and queues the rest);
- pays for what it runs on — with Stripe Skills the broker covers materials and, in the demo, the services a shop needs (logistics SaaS, freight labels, cloud compute);
- runs safely inside the NemoClaw sandbox, with a broker wall it cannot cross.
Synopsis
# the wall (start first) · only holder of the Stripe key broker.py # the read-only operator view at http://<host>:8090 dashboard.py # the brain: earn → spend, reasoning on a local model agent.py # prove the wall holds — three rogue requests, all blocked demo_refuse.py
Description
Every "AI agent that spends money" demo has the same two problems: you hand a language model your credit card and your cloud. ralph refuses both.
- The reasoning runs on a local model on a box in the building. Prices, customers and margins never leave.
- The agent never holds a payment credential. A separate broker process is the only thing that can move money, and only against a hard rulebook.
- Cloud is opt-in, per decision: switch to a bigger NVIDIA model for one hard call, then back to private.
Architecture
┌─────────────────────────────────────────────────┐
│ ralph host (on-prem DGX Spark · GB10) │
│ │
orders ─►│ agent.py broker.py │
│ "the brain" "the wall" │
│ • reasons local • ONLY holds the │
│ • holds NO key Stripe key │──► Stripe
│ • can only ASK • enforces the rulebook │ (test mode)
│ │ ▲ │ real PaymentIntents
│ │ request │ receipt │
│ ▼ │ │
│ ┌──────────────────────────────────┐ │
│ │ NemoClaw / OpenShell sandbox │ │
│ │ no internet · no interpreters │ │
│ └──────────────────────────────────┘ │
│ │
│ hermes.py ─► learns from issues, retunes policy│
│ dashboard.py ─► read-only live view :8090 │
└─────────────────────────────────────────────────┘
| File | Role |
|---|---|
| agent.py | The brain. Reads orders + inventory, decides what to sell and restock. Holds no key — submits requests, waits for receipts. |
| broker.py | The wall. The only process with the Stripe key. Validates every request against the rulebook, pays, writes receipts, logs issues. |
| sandbox_channel.py | The agent↔broker boundary. Drops requests / reads receipts inside the NemoClaw container via docker exec. |
| actions.py | Cash ledger + the real Stripe calls. Refuses to start on a live key. |
| hermes.py | The continuous-improvement loop — turns the run's issues into policy changes it applies (reorder bump + learned pre-filter), and queues the rest. |
| audit.py | The SHA-256 hash chain. Every consequential action, independently verifiable. |
| dashboard.py | Read-only operator view on :8090. The only write it accepts is the brain selector — never money. |
The broker wall
The broker is the single most important component: the only process that holds the Stripe key and the only one with a path to Stripe. The agent cannot pay. It can only drop a request into the sandbox and wait.
When a request arrives, the broker checks it against a hard rulebook before a cent moves:
- Known catalog — the SKU / product must exist. Off-catalog asks die here.
- Broker-authoritative price — the agent does not get to name the price; the broker does.
- Quantity cap — a per-order ceiling (500 units) stops absurd orders.
- Budget guard — it never auto-pays what it can't afford: it refuses to overspend, and an oversized-but-legitimate order routes to a human (below) instead.
- Actually-low check — restocks only go through for materials genuinely below their reorder point.
Pass all checks → a real Stripe test PaymentIntent and a receipt. Fail any → a refusal with its reason, written to the audit trail. Zero dollars move on a failed check.
Brains · private ⇄ cloud
ralph reasons privately by default — nothing leaves the building. One switch on the dashboard lets it move a genuinely hard call to NVIDIA's biggest model, then back. Privacy or horsepower, per decision.
| Toggle | Model it reasons on | Where | When |
|---|---|---|---|
| local (default) | Nemotron-3 Super 120B | on the Spark | ralph's default brain — routine calls, data never leaves the box |
| hermes (native) | Hermes 3 (8B) | on the Spark | Nous's model — selectable, and the mind its self-improvement loop reasons on |
| cloud | Nemotron 3 Ultra | NVIDIA API | the hardest calls — only when you select it and a key is present |
Mechanics: the toggle POSTs to /api/brain, which writes brain.json. agent.py reads it at decision time and routes to local Ollama or NVIDIA's OpenAI-compatible endpoint. The toggle selects a model name only — it can never touch money or a key. Cloud degrades gracefully: selected without a key, it stays local and says so.
Stripe Skills — it pays for the services it runs on
An autonomous business doesn't only buy raw materials; it pays for the services it runs on. Beyond materials, the broker can pay for services through Stripe Skills — in the demo, a logistics SaaS, a per-shipment freight label, and the cloud compute used when it switches to Nemotron 3 Ultra — each a real Stripe charge, each behind the same broker wall. (The live agent loop drives sales and restocks; the service payments are exercised by the film.)
Safety & refusals
Four independent layers stand between a misbehaving model and a real charge:
- No key in the agent. The agent process never holds the Stripe credential.
- The broker wall. A separate sole money-mover enforces the rulebook above.
- Sandbox boundary. Every request crosses the NemoClaw container — no internet, no interpreters — so the agent has no payment code path to Stripe.
- Test-mode only.
actions.pyrefuses to start on a livesk_live_…key. No real card is ever charged.
demo_refuse.py proves it on camera — three rogue requests injected straight into the sandbox:
BLOCKED off-catalog buy 10x TITANIUM-INGOT → unknown SKU, not on catalog BLOCKED hard budget panic-buy 500x sheet (~$46k) → exceeds cash on hand BLOCKED sane-quantity order 99,999 pails fluid → above the 500/order cap
Human in the loop
Safety isn't only blocking bad actions — it's having a path for the ambiguous ones. A legitimate but oversized spend (e.g. a $23,000 bulk reorder that exceeds cash on hand) is blocked by the broker rather than silently failing. On top of that, the dashboard adds a Human Approvals surface (approvals.json, POST /api/approve) where a person clicks Approve or Deny and the decision is audited — the control surface that makes an autonomous spender deployable. (In the live loop the broker simply blocks the overspend; the approvals queue is demonstrated in the film.)
Audit trail
Every consequential action the broker takes — a sale, a purchase, a service payment, a refusal, a human approve/deny — is appended to a SHA-256 hash chain (audit.py). Each entry's hash covers the previous entry's hash:
entry[n].hash = sha256( entry[n].data + entry[n-1].hash )
So the log can be independently recomputed and nothing can be altered or reordered without detection: change one row and every row after it fails verification. The dashboard shows a live ✓ verified badge over the head hash. This is the compliance record an enterprise needs before letting any agent move money.
Fleet / scale
ralph isn't one agent — it's a pattern. The same brain + broker + sandbox that runs one shop floor runs many, each sovereign on its own box, rolling up into one automated company a single operator watches (fleet.py). The dashboard's Fleet panel shows the per-site net and the aggregate.
Command reference
# reset to a clean rehearsal state $PY -c "import inventory,products,sandbox_channel as c,hermes; \ inventory.reset(); products.reset(); c.reset(); hermes.reset()" rm -f state.json brain.json learnings.json issues.json # the live processes (separate terminals) $PY -u broker.py # the wall (start first) $PY -u dashboard.py # read-only dashboard, http://<host>:8090 $PY -u agent.py # the earn → spend run $PY -u demo_refuse.py # the three REFUSE beats $PY -u hermes.py # turn issues into applied upgrades # regenerate the demo assets (headless, no desktop) ./make_film.sh <out> # self-narrating captioned film ./make_broll.sh <out> # clean dashboard B-roll
Everything is stdlib — the dashboard is Python http.server; the agent/broker use urllib. No heavy framework. Small, auditable, runs anywhere on the box.
Real vs. demonstrated
Straight with the judges about what runs live versus what's paced for camera:
- Real: the broker/agent/sandbox split; the Stripe charges (test mode) for sales and materials; the budget + rulebook refusals; the SHA-256 audit chain and its verification; the human approve/deny endpoint; Hermes actually changing reorder policy + installing its learned pre-filter; the local-model decisions.
- Paced for camera / demonstrated: the demo film re-enacts the real numbers and Stripe references on a timed cadence so it doesn't wait on model latency. The service payments (logistics SaaS, freight, compute) are real Stripe charges the broker can make, but they're exercised by the film, not requested by the live agent loop — same for the human-approval queue. The genuine autonomous loop is
broker.py + agent.py + demo_refuse.py + hermes.py, runnable live. - Illustrative: the fleet view shows the live Hamilton plant plus five sister sites representing the same pattern deployed across locations.
- Opt-in: the cloud brain (and its compute charge) requires an
NVIDIA_API_KEY; without it ralph stays fully local.