Auracle/Docs/The Forge

The Forge

Feed it provider API keys. Get back a ranked catalog of the datasets actually available to you, plus a slate of strategy candidates built on top of them.

Overview

Most quants spend more time hunting for usable data than they spend on alpha. Provider APIs have inconsistent coverage, undocumented gotchas, and asymmetric quality across asset classes. The Forge encodes that exploration as a pipeline of cooperating LLM agents, each with a narrow job.

The pipeline runs once per scan and produces three artifacts:

A catalog of datasets discovered across your active providers, each scored 0–100 on six quality dimensions and assigned a verdict.
A list of research plans — hypotheses the Planner agent proposed by combining your useful datasets.
A set of strategy candidates, each one a runnable backtest with metrics.

None of this auto-deploys. The whole point is to surface options for you to review.

Agents

Four cooperating agents, each with a narrow job.

Agent	What it does
Discovery	Classifies your API keys against the bundled registry (14 providers). Enumerates each provider's capabilities into the catalog.
Probe	Calls each provider for a small sample and scores it on the six quality dimensions. The only agent that talks to external APIs.
Planner	Reads the catalog and proposes research plans — falsifiable hypotheses that combine 1–3 useful datasets.
Synthesis	Codes one research plan as a runnable strategy, backtests it, and parks the result for your review. The only agent that writes executable code.

Catalog

The catalog is a per-install table of every dataset the Forge knows about. Each row is a (provider, capability, asset_class) triple with quality scores, a verdict, and links to provenance.

The six quality dimensions

Each dimension is scored 0–100 by the Probe agent.

Dimension	Question it answers
`coverage`	How much of the universe (symbols, time range) is actually present?
`freshness`	How recent is the most recent point? How long is the publish lag?
`completeness`	What fraction of expected fields are non-null?
`uniqueness`	Does this dataset duplicate something else in the catalog?
`stability`	Does historical data get revised? Is the schema stable across releases?
`trading_relevance`	Is this data plausibly tradable (vs. illustrative or after-the-fact)?

The aggregate score

The six dimensions are combined as a geometric mean, not arithmetic. The architectural guarantee: a zero in any single dimension clamps the aggregate to ~zero. A polygon endpoint that's 90/100 on five dimensions but 0/100 on coverage is not "average" — it's unusable, and the verdict reflects that.

# From auracle/forge/catalog.py
# Floor the zeroes at 1 so log() is defined.
floors = [max(d, 1) for d in dims]
score  = math.exp(sum(math.log(d) for d in floors) / len(floors))
# Score == 1 if any dimension is 0; ~90 if all are 90.

Verdict thresholds

The aggregate score maps to a verdict by these thresholds:

Score	Verdict	What it means
≥ 70	`useful`	Pass straight into the Planner's input pool.
≥ 40	`marginal`	Planner may use it with a caveat; backfill or supplement first.
≥ 15	`poor`	Excluded from Planner. Visible in the catalog so you know it exists.
< 15	`irrelevant`	Effectively hidden. Catalog entry kept for audit only.

Research

A research plan is a structured hypothesis. The Planner emits them as JSON in forge_research_plans.plan_json; you can also write them by hand.

{
  "hypothesis": "OI divergence on Coinalyze precedes 4h mean reversion in BTC-USD",
  "inputs": [
    { "dataset_id": 12, "alias": "oi"   },
    { "dataset_id":  7, "alias": "bars" }
  ],
  "feature_spec": {
    "oi_change_24h": "oi.delta(24h) / oi.rolling(7d).mean()",
    "ret_4h":        "bars.close.pct_change(4h)"
  },
  "label_spec":   { "horizon": "4h", "kind": "log_return" },
  "evaluation": {
    "walk_forward": { "train": "12mo", "test": "1mo", "step": "1mo" },
    "costs":  { "bps": 5 },
    "metrics": ["sharpe", "max_dd", "hit_rate"]
  }
}

Plans can be queued, paused, retried, or rejected. The Planner won't re-emit a plan it already proposed in the past 30 days (content-hash dedupe).

Candidates

A strategy candidate is the output of running Synthesis on one plan. It lives in forge_strategy_candidates with:

A pointer to the source plan and the inputs used.
The generated Python file path under strategies/_forge/.
The backtest result blob (equity curve, trades, per-period metrics).
A score against the plan's metric targets, plus a Planner-summarized verdict.

In the web UI, candidates appear in a table sorted by Sharpe. Clicking one opens the candidate review surface — equity curve, trade list, generated code, and a Promote button. Promotion copies the file out of _forge/ into your active strategies directory; only then can it be deployed.

Important Candidates are never automatically deployed. There is no path from "Synthesis produced a good-looking backtest" to "this is now running against real money" that doesn't go through an explicit human promotion step.

Limits & audit

The Forge spends money (LLM tokens) and writes code (synthesized strategies). Both are bounded.

Cost caps

Per-scan: hard cap of $5 USD of LLM spend by default. Configurable in auracle forge config.
Per-day: $25 USD aggregate across all agents.
Per-dataset probe: $0.50 + 2 minutes of API time.

Hitting any cap pauses the scan; the partial catalog is preserved.

Audit trail

forge_agent_runs records every agent invocation: which agent, which dataset, input fingerprint, token usage, wall-clock time, output fingerprint, and the install_uuid. Replay a scan by re-running the same input fingerprints; the output should hash-match (modulo non-determinism flagged at the LLM call boundary).

Promotion gate

Generated strategies are sandboxed and never auto-deployed. You review the code, the backtest, and the metrics before Promote moves the file into your active strategies directory; only then can it be deployed live.

Costs

Default budgets per agent (configurable per install).

Agent	Per-call cap	Per-scan cap	Model
Discovery	4k input / 1k output	$0.10	Claude Haiku
Probe	2k input / 500 output	$0.50 / dataset	Claude Haiku
Planner	16k input / 4k output	$1.50	Claude Sonnet
Synthesis	32k input / 8k output	$2.50 / candidate	Claude Sonnet

Cost is logged in forge_agent_runs.cost_usd. Aggregate with a direct SQL query (SELECT SUM(cost_usd) FROM forge_agent_runs WHERE created_at > now() - interval '30 days') until the auracle forge costs CLI ships in v1.1.