The personal research operator in the principal's voice.
Saigar is a single-user agentic research system: six agents, twenty-four stages, two pipelines, and one principal. It runs the literature, debates the position, drafts the artefact, and never breaches the principal's voice. This is what Sagar is building.
Saigar is what happens when you take an autonomous research pipeline derived from AutoResearchClaw, give it a typed harness, layer six specialist agents into it, build a learning loop that improves across runs, render the outputs in a broadsheet aesthetic, and constrain everything to one principal's voice and one principal's editorial sign-off.
It produces two artefact types in v1. The Brief is a bi-weekly payments competitive intelligence document, rendered as an interactive HTML presentation with animated mechanism diagrams. The Paper is a long-form research article published to sagarbharambe.com with full citation auditability. Both flow through the same 24-stage pipeline, the same six agents, the same gate structure.
The principal stays in the loop at three gates: literature screen approval, analytical approach approval, publish approval. Everything else is autonomous, recorded, and reviewable.
Saigar does not autopublish. The principal reviews and signs the artefact before it becomes editorial output. Saigar handles literature, debate, draft. The principal handles judgment.
The autonomy is real (the system can pivot a hypothesis, refine an analysis, regenerate a violating paragraph, all without intervention). The supervision is real too. Both, in the same system.
What this showcase covers
This site is Saigar in nine pages. Each one explains a layer; together they answer "what did Sagar build, how does it work, and what comes next."
Who Saigar is for and the problem it solves. The shape of one principal's research life and where the system inserts itself.
ReadA live mock of the operating surface: today's lead, runs in flight, gates awaiting decision. The product as the principal will use it.
ReadThe 24-stage canonical flow visualised. Brief and Paper as instantiations. Where each agent enters, where each gate appears.
ReadSix specialist agents, each with a domain, a state machine, and a failure taxonomy. Plus the cross-run loop the Reflector closes.
ReadFive processes on one VPS. Postgres-as-queue. Anthropic API direct. The deployment posture and the six reasoning checkpoints.
ReadHow Saigar improves across runs. The Reflector at stage 24, the signal lifecycle, the principal-supervised /learning view.
ReadThe reference artefact. The agentic commerce identity-layer read. With the mechanism diagram animating in newspaper or slides.
Readv1 to v2 to v3. From two pipelines to six. From single-pipeline runs to cross-pipeline orchestration. From principal-only to a small editorial team.
ReadSaigar is currently running a Brief on agentic commerce identity infrastructure. The Researcher retained 22 of 30 candidate sources. The Synthesizer's hypothesis at stage 8 took the structural framing: agent identity attestation, not the rails layer, is the contested surface. Gate B was approved at 09:43 this morning. The Writer is currently drafting slide 4 (the mechanism diagram).
Open the DeskOne reader. One operator. One signature on every artefact.
Saigar is built around a single person and the shape of his research life. That constraint is the most important architectural choice in the system.
Sagar Bharambe is a payments practitioner based in Rijswijk, the Netherlands. His professional terrain is European retail payments: scheme dynamics, regulatory shifts at the AFM and DNB, the open banking spillovers from PSD3 and the Instant Payments Regulation, the slow churn at the long tail of acquirers and PSPs and what the next layer of European stablecoin and tokenisation efforts mean for any of it.
The work is reading-heavy. The output is opinion-heavy. The volume of regulator filings, scheme press releases, banking association statements, market data, EBA consultation papers, AFM guidance letters, ECB working papers, and ISO 20022 progression notes is large enough that staying current is a job in itself. The synthesis required to turn it into useful editorial output, the kind that can land on sagarbharambe.com or in front of a counterparty, is another job.
Saigar is not a research assistant. It is the operator who runs the research, in the principal's voice, on the principal's schedule, with the principal's standards.
The shape of the problem
The conventional answer to "I read too much, I write too little" is a reading queue and a discipline of writing. The conventional answer fails for two reasons. The first is that the reading is unbounded; whatever discipline you bring, the literature outpaces you. The second is that synthesis is the hard part, not the reading; once you have decided what you think, the writing is shorter and easier.
The unconventional answer, and the one Saigar takes, is to delegate the literature scanning, the source ingestion, the multi-perspective debate that turns evidence into a position, and the first draft. The principal does what only the principal can do: judge whether the position is right, edit the draft into the principal's voice, sign the artefact, publish.
What Saigar carries
Three deliverables in v1:
The Brief. A bi-weekly payments competitive intelligence document. Eight to nine slides. Cover, TLDR, market state with a chart, mechanism diagram (animated SVG comparing flow mechanics across payment paths), reads (corporate or regulatory interpretations), implications by audience, sources, closing. Rendered as a single self-contained HTML file with two reading modes (newspaper scroll and slides). Internal cross-references navigate with a back chip. The Brief is private to the principal; it is operational intelligence, not editorial output.
The Paper. A long-form research article, ~1500 to 2500 words, published to sagarbharambe.com. Full 24-stage pipeline including the experiment subsystem (stages 10 to 13) for any quantitative analysis. Markdown with frontmatter, hyperlinked citations, references auto-generated.
The audit trail. Every artefact traces to a run. Every run traces to a trigger. Every LLM invocation records its system prompt hash, request, response, cost. The principal can ask "why does this Brief say what it says" and follow the chain back. This is not a feature; it is the foundational commitment of the system.
The constraints that shape the architecture
Saigar is built for one person. Not "starts with one person, scales to teams later." One person, ever. This eliminates entire categories of architectural complexity: no role-based access control, no multi-tenancy plumbing, no organisation-level abstractions. The data model has a singleton principal table. The auth boundary is one passkey. The infrastructure budget reflects this; €30/month for hosting plus €60-170/month in LLM cost is workable for personal use, untenable for ten users, irrelevant for a hundred.
Saigar is hosted on Hetzner Falkenstein. Anthropic API calls go to the EU endpoint where available. Object storage stays in the same region. This matches the regulatory environment the principal works in (payments, EU jurisdiction) and the principal's own data residency preferences.
The principal has prose constraints that are non-negotiable. No em-dashes. No "not X but Y" framing. A list of banned phrases ("doing the heavy lifting," "the real question is," "what most people miss"). These are enforced at three checkpoints: the Writer's system prompt, the post-generation validator, the Critic. Voice violations leak past zero checkpoints in v1's eval target. This is the most idiosyncratic thing about Saigar, and the most important.
There is no separate admin role, no operator role, no support tier. The principal sees the metrics dashboard, the run logs, the failed run queue, the configuration surfaces. Saigar is a tool for one person who builds it and uses it; the principal-operator distinction collapses. This makes the UI simpler and the failure modes more honest.
What "in the principal's voice" actually means
Voice is not a property of words. It is a property of habitual choices: which framings the principal prefers, which sources the principal weights, which interpretive moves the principal accepts and rejects. Saigar makes those choices visible, persistent, and supervised.
The voice constraint table is the visible layer. The learning loop is the deeper layer. After every run, the Reflector reads what the principal accepted, edited, rejected, and produces signals that future runs consume. After ten runs, Saigar knows that the principal prefers contrarian framings on payments adoption topics; after thirty runs, Saigar knows which corporate filings to weight up and which regulatory press releases to weight down. The principal supervises this learning; signals are reviewable, rejectable, weakenable on the /learning view.
The result, projected forward, is a system that does not just take instruction; it inherits taste.
What the principal sees, when the principal looks.
The Desk is Saigar's operating surface. Five destinations, four approval flows, one principal. This is a live mock of the Today view with a Brief 47 cycle in flight.
A Brief is awaiting publish.
Brief 47 on agent identity infrastructure is at gate C. The Critic flagged 2 minor findings (unverified URL on slide 5, weak relevance on one citation). Citation pass rate: 96%. Voice violations: 0. The Reflector has not yet run; it will fire on publish approval.
Gate C · Publish approval
Auto-advance: never (gate C is affirmative-only)
Bi-weekly · Wednesday
Edition 47 awaiting publish. Edition 48 schedules next Wednesday.
LiveOn-demand · idle
Last published: 'Web Bot Auth, the substrate everyone is quietly building on' (28 Apr 2026)
IdleContinuous · v2
Not yet active. Ships in v2.
InactiveRecent activity
The Desk does not push notifications. The principal opens it on their own schedule. Live runs propagate through Postgres LISTEN/NOTIFY to server-sent events; the page updates incrementally without refresh. Failed runs surface "needs your attention" cards. Cost-cap warnings surface inline.
Twenty-four stages. Two pipelines. One canonical flow.
Saigar's central abstraction is the Pipeline. Adding a new artefact type means writing YAML, not modifying code. The 24 canonical stages cover every step from problem decomposition to cross-run reflection.
The 24-stage canonical flow
Stages execute in order. Some are skippable; some open gates that pause for principal review. The Brief uses 18 stages (it skips the experiment subsystem); the Paper uses all 24. The Reflector at stage 24 closes the cross-run learning loop.
The Brief pipeline
Bi-weekly payments competitive intelligence. Default cadence: Wednesday 09:00 Europe/Amsterdam. Topic Ranker proposes three candidates 48 hours before; the principal accepts, overrides, or skips the cycle.
Brief skips stages 9 through 13 (the experiment subsystem). Brief is synthesis-driven, not empirical; the Synthesizer's stage 8 debate produces the hypothesis, the Writer drafts directly from there. Stage 14 result analysis still runs because the Synthesizer needs to produce the qualifications and surprises that go into the Implications slide.
Output is a single self-contained HTML file with two reading modes: newspaper (continuous scroll, expandable sections) and slides (one-per-viewport). Internal cross-references navigate with a back chip. The MechanismDiagram component is the distinctive visual element; it animates packets along SVG paths at relative settlement speeds.
Read Brief 47name: brief
display_name: EU Payments Bi-weekly Brief
version: 1.0.0
cadence:
scheduled:
cron: "0 9 * * 3"
timezone: Europe/Amsterdam
source_profile:
privileged_types:
- regulator_filing
- scheme_disclosure
jurisdictions: [NL, EU]
recency: 90d
min_total_sources: 12
max_total_sources: 30
output:
type: brief
format: html_dual_mode
exportable_to: [pdf]
stages:
- { ord: 1, name: subject_resolve }
- { ord: 2, name: problem_decompose, agent: researcher }
- { ord: 3, name: search_strategy, agent: researcher }
- { ord: 4, name: literature_collect,agent: researcher }
- { ord: 5, name: literature_screen, gate: A }
- { ord: 6, name: knowledge_extract, agent: researcher }
- { ord: 7, name: synthesis, agent: synthesizer }
- { ord: 8, name: hypothesis_gen, agent: synthesizer }
- { ord: 9, name: experiment_design, gate: B, skip_if: brief }
- { ord: 14, name: result_analysis, agent: synthesizer }
- { ord: 15, name: research_decision,agent: synthesizer }
- { ord: 16, name: outline, agent: writer }
- { ord: 17, name: draft, agent: writer }
- { ord: 18, name: peer_review, agent: critic }
- { ord: 19, name: revision, agent: writer }
- { ord: 20, name: principal_review, gate: C }
- { ord: 22, name: export }
- { ord: 23, name: citation_verify, agent: critic }
- { ord: 24, name: reflection, agent: reflector }
cost:
cap_usd: 25.00
warning_threshold_usd: 18.00
The Paper pipeline
On-demand long-form research articles. The principal types a topic, the Paper runs end-to-end (60-90 minutes typical wall clock), publishes to sagarbharambe.com on gate C approval.
Paper uses all 24 stages including the experiment subsystem (stages 10 to 13). For quantitative analysis, the Synthesizer designs the analytical approach at stage 9, generates code at stage 10, plans resources at stage 11, runs the experiment in a sandboxed runner at stage 12, and iteratively refines based on the analysis at stage 13. Stage 14 produces the result analysis with a multi-perspective debate.
Output is Markdown with frontmatter, ready for sagarbharambe.com. Hyperlinked citations resolve to a references section. Internal cross-references work in standard Markdown anchor syntax. The publishing path is principal-supplied (REST API, git push, custom CMS).
See the Synthesizername: paper
output:
type: article
format: markdown
publish_to: sagarbharambe.com
# All 24 stages declared, including:
stages:
# ... (1-8 same as brief) ...
- { ord: 9, name: experiment_design, gate: B }
- { ord: 10, name: code_generation, agent: synthesizer }
- { ord: 11, name: resource_planning, agent: synthesizer }
- { ord: 12, name: experiment_run, agent: synthesizer }
- { ord: 13, name: iterative_refine, agent: synthesizer }
- { ord: 14, name: result_analysis, agent: synthesizer }
- { ord: 15, name: research_decision, agent: synthesizer }
- { ord: 16, name: outline, agent: writer }
- { ord: 17, name: draft, agent: writer }
- { ord: 18, name: peer_review, agent: critic }
- { ord: 19, name: revision, agent: writer }
- { ord: 20, name: principal_review, gate: C }
- { ord: 21, name: external_publish }
- { ord: 22, name: export }
- { ord: 23, name: citation_verify }
- { ord: 24, name: reflection, agent: reflector }
cost:
cap_usd: 25.00
warning_threshold_usd: 20.00
Pipeline as primitive
Adding a new pipeline is a configuration change, not a code change. v2 candidates already drafted: regulatory monitoring (continuous), investment thesis stress-test (on-demand with adversarial debate), quarterly market wrap (scheduled monthly), deal/M&A memos (on-demand with comparable transactions), strategic competitor assessments (scheduled quarterly), org design choices (on-demand).
Each declares its cadence, sources, voice profile, gates, and output. The Orchestrator dispatches against any pipeline through the same code path. New specialist agents register via plugin. The build cost of a new pipeline is roughly 1-2 weeks: author the YAML, register specialist agents if novel, register the artefact renderer if novel, run an end-to-end test against a fixture topic.
This is the architectural commitment that makes Saigar future-proof: the abstraction over pipeline definition rather than the proliferation of pipeline-specific code.
See the v2/v3 roadmapSix specialists. No generalist.
Each agent owns specific stages, a state machine, and a failure taxonomy. The Orchestrator coordinates; the others execute. Click any agent for the deep panel.
The harness. Owns the run state machine, dispatches to agents, manages gates, enforces cost. Six LLM-mediated reasoning checkpoints with deterministic fallback. The most architecturally important agent.
Reads the world. Decomposes the subject into sub-questions, executes searches across OpenAlex/S2/arXiv/web/RSS, screens for relevance, extracts structured knowledge cards.
The analytical core. Multi-agent debate at stage 8 (3 positions × 2 rounds + synthesis = 7 calls). Stage 15 PROCEED/REFINE/PIVOT verdicts. The experiment subsystem for Paper runs.
The agent that says no. Three reviewer personas at stage 18 (Evidence, Voice, Structure). Four-layer citation verification at stage 23. Voice constraint third checkpoint.
Authors the artefact. Article (Markdown) or Brief (SlidePlan). Voice constraint first checkpoint. Cross-reference authoring. The agent the principal sees most.
The closing-the-loop agent. Stage 24 reads the run trace and produces structured learning_signal records that future runs consume. Principal-supervised on the /learning view.
How an agent invocation actually works
Every agent invocation is a structured exchange: typed task packet in, typed result out, recorded.
The Orchestrator builds a task packet from the run state, the pipeline definition, the principal's interest signals, and any active learning_signal rows that scope-match. The agent receives the packet, validates it against its Pydantic schema, and refuses fast on mismatch. The agent makes one or more LLM calls, validates each response against its output schema, and persists the result.
If the agent fails, the Orchestrator's recovery routing checkpoint reasons over the failure code, the failure history of this run, and active signals about this failure pattern. Recovery options range from retry-same-config to spawn-pivot-child to escalate-to-principal. Each option has a deterministic fallback if the LLM call fails.
Every step is recorded: the system prompt hash, the request, the response, the cost, the latency, the consumed signals. This is the audit trail that makes "why did Saigar say this" answerable.
# Typed task packet (example: stage 7 synthesis)
class SynthesisTask(BaseModel):
schema_version: Literal["1.0"]
run_id: UUID
knowledge_cards: list[UUID]
sub_questions: list[SubQuestion]
voice_profile: VoiceProfile
learning_signals: list[LearningSignalRef] = []
# Typed output
class SynthesisOutput(BaseModel):
schema_version: Literal["1.0"]
clusters: list[FindingCluster]
patterns: list[Pattern]
contradictions: list[Contradiction]
gaps: list[KnowledgeGap]
confidence: float
# Schema mismatch is a SEVERE FAILURE.
# No silent degradation, no defaults, no
# autocorrection. Validation is strict.
Orchestrator
The Orchestrator runs every registered pipeline by walking its declared stages, dispatching tasks, opening gates, enforcing cost, and emitting events. It is the only component that knows what a pipeline is.
The Orchestrator is a hybrid. The run lifecycle is a deterministic state machine: pending → running → gated → running → complete. The judgement-heavy decisions are LLM-mediated at six named checkpoints with deterministic fallback. The combination is the real architectural innovation.
The six reasoning checkpoints
Each checkpoint produces a structured decision recorded to orchestrator_decision with the input context, the LLM response, and the deterministic fallback that would have run otherwise. Per-checkpoint disable flag, per-checkpoint cost ceiling (€0.26), per-checkpoint timeout (30s).
Reads the subject, the principal's interest signals, the corpus state, the pipeline definition. Produces: estimated cost band, recommended cost cap, model routing overrides per stage, source budget override, risk flags.
A subject the principal has covered three times before doesn't need full source breadth; the Researcher can run with min_total_sources=8 instead of 12. A controversial topic benefits from Opus on the synthesis stage. A quantitative subject benefits from a higher source budget. The deterministic fallback (use pipeline defaults) works fine; the LLM checkpoint makes the choice better when it works.
Fallback: use pipeline defaults across the board.
When a stage fails, instead of the hardcoded recipe the Orchestrator reasons over the failure code, the failure history of this run, and active signals about this failure pattern. Output: retry-same / retry-modified / skip / escalate / abort-with-partial.
A second timeout on stage 6 extraction with a 200kb source body gets a smarter response than "retry with same config." The Orchestrator can suggest "chunk the source into halves and extract each independently."
Fallback: the deterministic recovery recipes from the failure taxonomy.
After declarative skip_if evaluation, the Orchestrator reasons over whether the stage can be skipped or partially executed given prior outputs and corpus state. Conservative: confidence threshold for skip is 0.7 (higher than the default 0.55).
Stage 6 extraction can be partial when 60% of the candidate sources are already extracted from prior runs on the same subject. The Orchestrator can decide to re-extract only the new 40%, saving cost and time.
Fallback: respect declarative skip_if only.
When crossing the cost warning threshold, instead of binary continue-or-halt, the Orchestrator evaluates: continue, skip cosmetic stages (second revision, citation re-verify), halt with current draft, or request principal cap raise.
The deterministic fallback halts at hard cap regardless. The reasoning checkpoint can decide that skipping a second revision pass leaves the artefact 95% as good for 60% of the remaining cost. Or it can recommend escalating to the principal for a cap raise rather than abandoning.
Fallback: emit cost.warning at threshold, halt at hard cap.
Second-opinion on the Synthesizer's PROCEED/REFINE/PIVOT verdict. The Orchestrator can downgrade (PIVOT → REFINE → PROCEED) but cannot upgrade. Considers pivot count, remaining budget, principal preference for assertiveness.
The Synthesizer reasons within a stage; the Orchestrator reasons across the run. The Synthesizer might want to PIVOT on marginal evidence; the Orchestrator can see "we're at pivot count 1 with 60% of cost cap consumed" and override to REFINE. The downgrade-only constraint prevents the Orchestrator from manufacturing pivots the Synthesizer didn't see grounds for.
Fallback: affirm the Synthesizer's verdict.
Decides whether stage 24 reflection runs at all (some runs have thin signal worth learning from), what signal kinds to expect, which existing signals might be superseded. Routes priority for the Reflector.
A Brief on a familiar topic with no surprises, no Critic findings, no principal overrides has thin signal; running stage 24 produces low-quality reflection that pollutes the signal space. A Paper that pivoted twice and was edited at gate C has rich signal worth processing. The reflection routing checkpoint lets the Orchestrator give the Reflector the right context.
Fallback: always run stage 24 with default scope.
The state machine
Run states: pending, running, gated, paused, complete, failed, cancelled. Transitions are guarded by Postgres CHECK constraints; the database refuses invalid transitions. The Orchestrator must be idempotent across crashes; on restart, it finds the highest-ord stage with status complete-or-skipped and resumes from the next ord. State changes write to Postgres before the next step.
The kill switch
If a checkpoint is producing bad decisions, the principal can disable it from the Desk's settings. A disabled checkpoint always uses its fallback. This is the safety valve: when reasoning calibration drifts, fall back to deterministic policy while investigating.
Researcher
The Researcher reads the world for the run. Multi-source retrieval, source ranking, screening, structured knowledge extraction.
Five stages, one mission: turn a subject into a knowledge graph the Synthesizer can reason over. The Researcher is also the read-side gateway for the system; nothing else fetches from external sources.
Stage 2 · Problem decompose
One Sonnet call. Reads the subject, the pipeline source profile, the principal's interest signals, the voice profile. Produces 3 to 7 sub-questions, 2 to 3 framing angles, an explicit out-of-scope list. The decomposition shapes everything downstream.
Stage 3 · Search strategy
One Sonnet call producing query specifications per channel. OpenAlex queries (keywords, year range, type filters), arXiv queries with category filters, web search queries (3-6 specific phrasings, not paraphrases), regulator feed selectors. Each query annotated with the sub-question(s) it targets.
Stage 4 · Literature collect
The mechanical heavy-lifting stage. Per channel: issue queries with rate-limit awareness, parse responses into normalised CandidateSource shape, dedup against existing source rows, insert new sources, fetch full bodies where available, chunk, embed (Anthropic embeddings primary, OpenAI text-embedding-3-small as fallback). Failed channels degrade gracefully; circuit breaker opens after 5 consecutive failures within 60 seconds, stays open for 5 minutes.
Stage 5 · Literature screen (Gate A)
Per candidate, a Haiku-tier call producing a relevance score and one-line rationale per sub-question. Below threshold or matching the out-of-scope list → rejected. Cost-conscious: ~€0.00-0.00 per source. If retained count drops below min_total_sources, emit source.insufficient and trigger search-strategy refinement (jump back to stage 3).
Gate A: principal sees retained vs rejected sets and approves before stage 6 proceeds. Auto-advance after 12 hours.
Stage 6 · Knowledge extract
Per retained source, a Sonnet call (Haiku for short sources). Produces a KnowledgeCard with discrete findings, entity mentions, quantitative claims, temporal anchors, methodology notes, observed biases, extraction confidence. The KnowledgeCard collection is the knowledge graph the Synthesizer reads.
Channels in v1
OpenAlex
Free, broad coverage. Primary academic channel.
Semantic Scholar
Better quality scoring, optional API key.
arXiv
Free, current preprints.
Web search
Tavily / Brave / Anthropic web_search; resolved in Phase 2.
RSS
Feed parsing for press releases, regulator publications.
Scrape
Structured HTML extraction with selectors declared in pipeline YAML.
Continuous mode (v2 hook)
For pipelines with cadence.continuous.enabled: true (none in v1, regulatory pipeline in v2), the Researcher polls registered source feeds. Each cycle creates a short run that polls, ingests new items, applies materiality scoring, and produces zero or more corpus_update artefacts. The continuous trigger code path exists in v1 as a no-op; activating it requires only a YAML registration.
Synthesizer
Where independent claims become a synthesised position. The Synthesizer's verdicts shape what the Writer eventually says.
The Synthesizer's load-bearing innovation is the multi-agent debate at stage 8. Three debater personas argue from different framings, two rounds of cross-examination, then a synthesiser integrates. Seven Sonnet calls for one hypothesis. The principal's prior multi-agent debate work, made into a first-class stage.
Stage 8 · Hypothesis with multi-agent debate
async def hypothesis_gen_with_debate(self, task):
# Phase 1: parallel positions
positions = await asyncio.gather(*[
self.invoke_debater(task, position_index=i)
for i in range(task.debate_config.positions) # default 3
])
# Phase 2: cross-examination (default 2 rounds)
for round_idx in range(task.debate_config.rounds):
positions = await asyncio.gather(*[
self.invoke_debater_with_context(
task, position_index=i, prior_positions=positions
)
for i in range(task.debate_config.positions)
])
# Phase 3: synthesis
consensus = await self.invoke_synthesiser(
task, final_positions=positions
)
return Hypothesis(
statement=consensus.statement,
debate_positions=positions,
consensus_rationale=consensus.rationale,
confidence=consensus.confidence,
falsifiability=consensus.falsifiability
)
Three debater personas as system prompts: Position 1 the conventional read (what consensus says), Position 2 the contrarian read (what dissenting evidence implies), Position 3 the structural read (what the underlying mechanics suggest). Round 2 lets each debater update positions in light of others' arguments. The synthesiser is a fourth call that integrates.
For pipelines with require_disconfirming_position true (v2 thesis stress-test), Position 2 is constrained to find the strongest disconfirming case. The debate is real: each debater receives the same KnowledgeCards but argues from its assigned framing.
Stages 10-13 · The experiment subsystem (Paper only)
For Paper pipelines that involve quantitative analysis or empirical claims, stages 10 to 13 mirror AutoResearchClaw's experiment subsystem. Code generation produces the analytical script. Resource planning sizes the run. Experiment run executes in a sandboxed Python runner with bounded time and memory. Iterative refine handles failures, partial results, and dimensional issues. The Synthesizer here is the strategic owner; the actual code execution happens in a separate runner process.
Brief skips this subsystem entirely (skip_if: brief in the pipeline YAML). Brief is synthesis-driven, not empirical.
Stage 15 · The PROCEED / REFINE / PIVOT decision
The decision the Orchestrator executes. Inputs: the hypothesis, the analysis, the remaining cost budget, the pivot count. Decision logic:
- PROCEED if hypothesis holds with high confidence and no critical surprises. Confidence threshold: 0.6.
- REFINE if results are partial, ambiguous, or methodologically weak. Jump to stage 13 with refinement instructions. Confidence threshold: 0.5.
- PIVOT if results refute the hypothesis or surface a meaningfully different finding. Spawn a child run with a new hypothesis. Capped at 2 pivots per chain. Confidence threshold: 0.65.
The Orchestrator's pivot evaluation checkpoint can downgrade the verdict (PIVOT → REFINE, REFINE → PROCEED) but cannot upgrade. This prevents manufactured pivots while preserving the Synthesizer's ability to flag genuine refutation.
Critic
The agent that says no. Peer review, citation verification, voice constraint enforcement, materiality scoring. The Critic's function is to find what the Writer or Synthesizer missed.
Three reviewer personas at stage 18, four-layer citation verification at stage 23, third voice checkpoint when the Writer's first two fail. The Critic is the most conservative agent in Saigar; auto-admit happens only when the peer review score is high and citation verification is clean.
Stage 18 · Three-reviewer peer review
Three reviewer personas run in parallel:
Evidence reviewer
Cross-references every claim against the KnowledgeCards. Flags unsupported assertions, weak evidence chains, missing corroboration.
Voice reviewer
Scans for the principal's prose constraints: em-dashes, "not X but Y" framing, banned phrases, AI-slop patterns, Atelier register adherence.
Structure reviewer
Checks artefact coherence, the load-bearing of each section, whether the lead actually leads.
After three independent reviews, a synthesis call produces a unified PeerReview output: overall score 0-100, dimension scores, findings list with severity (blocker/major/minor/nit), recommendation (accept/revise/reject). Findings include location, excerpt, issue, and optional suggestion.
A revise recommendation triggers stage 19 (Writer revision); after revision, stage 18 runs again at most once. A second revise becomes reject, surfaced to the principal at gate C.
Stage 23 · Four-layer citation verification
Layer 1 · Primary identifier
mechanicalarXiv ID resolvable, DOI valid via CrossRef, regulator filing reference resolvable. No LLM call.
Layer 2 · Title and metadata match
mechanicalSemantic Scholar lookup verifies title and authors match the citation's metadata. No LLM call.
Layer 3 · Body presence check
mechanicalThe cited content exists in the source body at the specified position. Text matching, no LLM call.
Layer 4 · Relevance assessment
Haiku callAn LLM call confirms the source actually supports the claim being made. One Haiku call per citation. Threshold: relevance ≥ 0.6.
A citation passes if it clears layers 1-3 (mechanically verified) AND layer 4 (relevance ≥ 0.6). Failures: weak (verified but low relevance, removed and paragraph regenerated) or fabricated (failed layer 1 or 2, removed entirely, surfaced to principal). Cost: ~€0.00-0.00 per citation, cached by (source_id, claim_hash) so repeat citations across runs reuse the verdict.
Voice constraint third checkpoint
When the Writer's post-generation validator detects a voice violation that survives one regeneration, the Critic runs a focused review on the offending paragraph. Three options: auto_fix (mechanical replacement, e.g. em-dash → comma), accept_with_flag (borderline, surfaced to principal), block (unrecoverable, gate C rejection with violation).
Voice violations leak past zero checkpoints in v1's eval target.
Writer
The agent the principal sees most. Every word that lands on the Desk's archive or sagarbharambe.com flows from the Writer. Voice is policy, not preference.
The Writer authors in two modes. Article mode produces Markdown for sagarbharambe.com publishing. Brief mode produces a SlidePlan that the renderer turns into dual-mode HTML. The Writer is the first checkpoint in the three-layer voice constraint enforcement.
Stage 16 · Outline / SlidePlan
For Article mode: one Sonnet call produces an ArticleOutline with 4-7 sections, each with purpose, findings to use, target word count.
For Brief mode: one Sonnet call produces a SlidePlan with 7-9 slides selected from the Brief Component Library catalogue (cover, TLDR, chart, mechanism, read, implications, sources, closing). The Writer maps findings to specific slides, identifies whether a mechanism diagram is appropriate, authors the cover headline and deck line.
Stage 17 · Section-by-section drafting
The draft is produced section-by-section, not in one call. Each section is one Sonnet invocation with the section's purpose, the findings to use, the voice constraints, the target word count, and the drafts of preceding sections (for coherence). For Brief mode, "section" is "slide". Cost: 6-12 Sonnet calls per Brief stage 17, ~€0.42-1.27.
Voice constraint authoring (first checkpoint)
The Writer's system prompt incorporates voice constraints as hard rules:
Voice constraints (these are policy, not suggestions): - Do not use em-dashes (—) anywhere. Use commas, colons, parentheses, or sentence breaks. - Do not use en-dashes (–) anywhere. - Do not use the construction "not X, but Y" or "not just X but Y". State the position directly. - Do not use these phrases: "this is the hard truth", "doing the heavy lifting", "the real question is", "what most people miss", "here's the thing", "this is where it gets interesting", "let that sink in". - ... [full list from voice_constraint table]
After generation, a deterministic post-generation validator scans for violations. If found, the Writer regenerates the offending paragraph once with the violation embedded as a forbidden example. If the regeneration still violates, the Critic's third checkpoint takes over.
Cross-reference authoring (v0.3)
The Writer authors inline reference markers between sections of the artefact:
[see Mechanism, slide 4]for slide-level reference[see Chart: agent traffic share, slide 3]for component-level reference[Source: Mastercard Oct 2025 disclosure]for source reference[see "Mechanism analysis"](#mechanism-analysis)for Article anchor
Cardinality target: 2-5 internal references per Brief, not "every other sentence is a link." Cross-link only when the reference adds navigational value. Broken cross-references are flagged as minor Critic findings at stage 18.
Expandable content rule
The Writer does not generate expanded_content reflexively. The system prompt instructs: only produce expanded content when there is substantively more to say that a serious reader would want. If the surface body says everything worth saying, leave expanded_content null. The render layer omits the "read more" affordance when the field is null. This prevents the lazy expansion failure mode where the Writer pads the brief with restated material.
Reflector
Saigar's cross-run improvement loop owner. After every artefact-producing run, the Reflector reads the trace and produces structured learning_signal records that future runs consume.
Without the Reflector, every Saigar run is a fresh start. With it, the system inherits taste over time: which framings the principal accepts, which sources behave consistently, which voice constraints get violated on which topic patterns. Principal-supervised throughout.
Six signal kinds in v1
topic_acceptance
Which topic patterns the principal accepts vs overrides at scheduled cycle proposals.
source_credibility
Per-publisher credibility patterns derived from citation verification outcomes and principal feedback.
debate_position_preference
Which debate positions the principal accepts in stage 8 outputs, scoped by topic pattern.
voice_violation_pattern
Which voice constraints get violated on which topic patterns, used to strengthen Writer enforcement.
critic_finding_pattern
Recurring finding categories that surface across runs (e.g., consistent unsupported claims about a topic).
cost_outcome
Cost-vs-quality patterns: when did high-cost stages produce better outcomes vs. when did frugal runs match.
Signal lifecycle
A signal is born from a Reflector stage 24 invocation, a recurring Critic finding pattern, or principal manual entry. It accumulates strength as more runs reinforce it (initial 0.4-0.6, capped at 1.0). It weakens when contradicting runs occur. It expires if not reinforced for 6 runs (default decay). It can be rejected, weakened, or strengthened by the principal at any time.
At task creation, each agent queries learning_signal for active rows where subject matches and scope matches the task context. Active signals condition the system prompt. Every agent_invocation records which signals it consumed in metadata.signals_consumed.
Drift mitigation: novelty injection
An opt-in flag in the principal's preferences causes each agent to ignore one randomly-selected matching signal per N runs (default off, principal-enabled per agent). This guards against Saigar collapsing toward narrow framings. The principal observes drift on the /learning view and enables novelty injection on specific agents as needed.
This was a deliberate addition. The risk of cross-run learning is over-fitting to past principal preferences in ways that narrow Saigar's range. The mitigation is supervised: novelty injection is opt-in, not default. The principal sees recent injections on the /learning view and can disable if injections are causing more harm than they prevent.
Sample signal
{
"signal_kind": "debate_position_preference",
"subject": "synthesizer",
"scope": {
"pipeline": "brief",
"topic_pattern": "agentic commerce / payment infrastructure"
},
"observation": "On agentic commerce topics across 4
runs, the principal accepted structural framings
(identity-layer focus) over rails-layer consensus by a
margin of 3-1. Edits at gate C consistently strengthened
the structural reading.",
"prescription": {
"weight_adjustment": {
"position_3_structural": 0.15,
"position_1_consensus": -0.10
}
},
"initial_strength": 0.55,
"expires_after_runs": 6,
"rationale": "Four runs is enough to suggest a pattern
but not enough for high confidence. Set strength
mid-range; reinforce on next confirming run."
}
Five processes. One VPS. Postgres-as-everything.
Saigar runs on Hetzner CX31 in Falkenstein. Caddy, Next.js, FastAPI, Brief renderer, Worker. Postgres for queue, events, storage. EU residency throughout.
System topology
The five processes
Reverse proxy with automatic TLS via Let's Encrypt. Routes to Next.js on /, FastAPI on /api/*, Brief renderer on internal port only.
App Router. Server components for static surfaces. Client components only for interactive (gate approvals, live progression). SSE for run-level event streaming.
REST + SSE. Pydantic 2 for validation. SQLAlchemy 2 async with asyncpg. Authentication via passkey (WebAuthn) or bearer token for CLI.
Internal-only service. Accepts SlidePlan JSON, validates against schema, renders to single self-contained HTML with bundled toggle and navigation JavaScript. Deterministic, cached by SlidePlan hash.
Long-running. Hosts Orchestrator + 6 agents + APScheduler + continuous trigger loop + event subscriber + nightly learning_signal sweeper. systemd-managed, auto-restart on crash.
Why no Redis, no Celery
v1 single-user load is trivial: a few runs per week. One process is simpler than a distributed worker pool.
The queue is implemented via Postgres SELECT FOR UPDATE SKIP LOCKED (the pgmq pattern). LISTEN/NOTIFY drives the event bus. The scheduler runs in-process. No Redis to operate, no Celery workers to scale, no broker to monitor.
The Orchestrator's interface to the queue is abstracted (a JobQueue interface). If v2 or v3 load demands it, the worker can be split into queue-consumer instances and the queue migrated to Redis as a configuration change. v1 doesn't need it.
# In-process queue via Postgres
async def claim_next_task(conn):
async with conn.transaction():
row = await conn.fetchrow("""
SELECT id, run_id, stage_ord, payload
FROM task
WHERE status = 'pending'
ORDER BY priority DESC, created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
""")
if row:
await conn.execute(
"UPDATE task SET status = 'claimed' WHERE id = $1",
row['id']
)
return row
# Event bus via LISTEN/NOTIFY
async def emit_event(conn, event):
async with conn.transaction():
await conn.execute(
"INSERT INTO event ... VALUES ...",
...
)
await conn.execute(
"INSERT INTO event_outbox ... VALUES ...",
...
)
await conn.execute(
f"NOTIFY {event.channel}, '{event.id}'"
)
Cost model
Per-run cost composition
| Component | Brief | Paper |
|---|---|---|
| Researcher | €1.27-3.40 | €2.55-6.80 |
| Synthesizer | €1.27-2.55 | €2.55-5.10 |
| Critic | €0.42-0.85 | €0.68-1.27 |
| Writer | €0.68-1.70 | €0.51-1.27 |
| Orchestrator (incl. 6 reasoning checkpoints) | €0.42-1.02 | €0.51-1.19 |
| Reflector (stage 24) | €0.34-0.68 | €0.42-0.85 |
| Total per run | €4.42-10.20 | €7.22-16.49 |
Disaster recovery posture
RTO 4 hours
From disaster to running Saigar. Provision new VPS, restore latest dump, replay WAL.
RPO 24 hours
Source data loss bound. Run history loss bound: 7 days. Published artefacts: near-zero (they live on sagarbharambe.com).
Quarterly tested
Test restore to a separate VPS verifies procedures work. RTO/RPO are not heroic; v1 single-user, downtime is annoying not catastrophic.
The system inherits taste.
Saigar's cross-run improvement loop. The Reflector at stage 24 produces structured signals. Future runs consume them. The principal supervises throughout.
The three-phase lifecycle
Observation
During every run, signals are captured passively: principal feedback, gate decisions, citation outcomes, Critic findings, cost-vs-quality outcomes, topic acceptance vs override. No special instrumentation; these flow into existing tables.
Prescription
At stage 24, the Reflector reads the run trace, queries existing signals, and produces zero or more new learning_signal rows or updates to existing ones. Deduplicates: similar signals reinforce rather than duplicate.
Consumption
At task creation, each agent queries learning_signal for relevant active rows and incorporates them into the system prompt. Every agent_invocation records consumed signals for audit.
The /learning view
The principal's supervision surface for learning_signal rows. Every active signal is reviewable. The principal can accept, reject, strengthen, weaken, or expire any signal at any time.
Active learning signals
3 unreviewed · 18 active · 2 expiring this weekStructural framings on agentic commerce topics
On agentic commerce topics across 4 runs, the principal accepted structural framings (identity-layer focus) over rails-layer framings by a margin of 3-1.
Network agent-pay announcements overstate transaction volumes
Across 3 runs, network press releases on Agent Pay and Intelligent Commerce transaction counts have been corroborated at 60-80% of stated figures by independent reporting. Down-weight in screening.
'Doing the heavy lifting' violations on financial topics
3 violations across 2 runs. Strengthen this constraint's emphasis in Writer system prompt for financial pipelines.
Agentic commerce topics accepted; pure crypto topics overridden
The principal has accepted 4 of 4 agentic commerce topic proposals; overridden 3 of 3 pure crypto topic proposals in favor of payments-AI intersection topics.
The novelty injection toggle
Drift mitigation, opt-in per agent.
The risk of cross-run learning is over-fitting to past principal preferences in ways that narrow Saigar's range. After ten runs of contrarian framings on payments, will Saigar still produce a non-contrarian framing when the evidence genuinely calls for one?
The mitigation is the novelty injection toggle. When enabled per agent, the agent ignores one randomly-selected matching signal per N runs. The principal sees recent injections on the /learning view: "On run 47, the Synthesizer ignored signal X (debate position preference on payments topics)."
Default: off across all agents. The principal observes drift on /learning and enables novelty injection where useful. Most likely first candidate: the Topic Ranker, where drift toward past topic preferences is most visible.
The reference artefact.
Brief 47 on agentic commerce identity infrastructure. The visual reference for the spec set, the Phase 2 acceptance test for the build, the principal's own first published Brief.
What follows is a condensed, embedded preview. The full Brief is a single self-contained HTML file (~250 KB) with mode toggle, internal cross-references, animated mechanism diagram, expandable sections.
Bi-weekly · Edition 47 · 02 May 2026
The agentic commerce question, settled at the wrong layer.
Why platform-vs-network framing misses the actual contest, and what Web Bot Auth's quiet adoption tells you about where trust will live in 2026 and 2027.
TL;DR
Volume · Production
Five concrete agentic commerce deployments live by April 2026: Mastercard Agent Pay, Visa Intelligent Commerce, Amazon Buy for Me, Coinbase x402, and the ChatGPT retailer apps that replaced Instant Checkout in March. The pilot phase is over.
Mechanism · Latency
Three settlement paths exist in production. Browser-driven flows trip legacy fraud loops at ~3 seconds. Network-tokenized routes (Agentic Tokens, TAP) settle at ~1.5 seconds. Agent-native protocols (ACP, UCP) settle at under one second. The differential is structural.
Trajectory · Identity
Web Bot Auth (IETF RFC 9421, Cloudflare-led) has quietly become the substrate every commercial protocol builds on. The networks tokenise above it. The platforms compose protocols above it. Whoever attests agent identity defines the trust assumption.
How the three agent-checkout paths actually settle
Body · Where this lands
The agentic commerce question through 2024 and into early 2025 was whether this was real. Whether AI agents could actually be trusted to spend money. Whether merchants would accept traffic that did not come from humans. Whether fraud engines could even tell the difference. By September 29, 2025, when the first live agentic transaction settled on Mastercard's network, that question had a public answer. The first wave of pilots converted to production through Q4 2025 and Q1 2026. Five concrete deployments now exist: Mastercard Agent Pay, Visa Intelligent Commerce, Amazon Buy for Me, Coinbase x402, and the ChatGPT retailer apps that replaced Instant Checkout in March 2026.
The volume question is settled. The interesting question moved up the stack.
The party that controls agent identity attestation defines the trust assumption everything else builds on. Through 2026, that party is converging on Cloudflare's Web Bot Auth standard, and almost nobody is talking about it.
Read 1 · Mastercard's disclosure, October 2025
On the Q3 earnings call, Mastercard CEO Michael Miebach told analysts that the company processed its "first agentic transaction" during the quarter. The transaction settled September 29, 2025. He committed Mastercard to being "at the center" of agentic commerce going forward. The infrastructure positioning was specific: Ethoca's real-time dispute data and Mastercard Threat Intelligence as the fraud-prevention layer beneath Agent Pay, Mastercard Agentic Tokens scoped per agent and per session as the credential model, Web Bot Auth as the underlying agent identity layer.
The strategic read is straightforward. Mastercard is betting that consumers will trust the network's brand over the LLM vendor's brand to adjudicate what an agent can and cannot do with their money. The Microsoft Copilot Checkout integration, announced in parallel, is the validation channel for this thesis. If consumers click "buy with Mastercard Agent Pay" inside Copilot more often than they grant ACP-flavoured authorisations inside ChatGPT, the network wins the trust positioning. If they do not, the platforms do.
The bet worth watching: Mastercard's Multi-Token Network and the Chainlink tie-up are positioning the network as the trusted conversion layer between fiat card acceptance and onchain settlement. Stablecoin-native agents (Coinbase Agent.market with x402, Stripe and Tempo's MPP) operate on rails Mastercard does not control. The Multi-Token Network is the bridge that keeps Mastercard relevant in agent-to-agent flows. It is also the most architecturally interesting product the network shipped in 2025.
Read 2 · Google's UCP launch, NRF January 2026
The Universal Commerce Protocol launched on January 12, 2026 at NRF with a deliberate choice of partners. Etsy, Shopify, Target, Wayfair, and Walmart on the merchant side. Adyen, American Express, Mastercard, PayPal, Stripe, Visa, and Worldpay on the payments side. The protocol composes over Google's earlier A2A and AP2 (Agent Payments Protocol) and over Anthropic's MCP, which had been donated to the Linux Foundation's Agentic AI Foundation in December 2025. UCP initially powers a checkout feature inside Google's AI Mode and the Gemini app, using Google Pay and payment methods saved in Google Wallet, with PayPal arriving in the months following.
Two things are worth noting about the launch. First, every major card network endorsed UCP while each was also actively building their own protocol. Visa has TAP. Mastercard has Agent Pay's Acceptance Framework. Both networks publicly endorse UCP because the alternative, refusing to interoperate with the platform that owns the consumer agent surface, is worse than the cost of supporting a competing standard. The endorsement is hedging.
Second, Google launched UCP with the merchants. OpenAI launched Instant Checkout without them, beyond Etsy and a handful of Shopify partners. By March 2026, OpenAI shuttered Instant Checkout and replaced it with retailer apps inside ChatGPT that route payments through merchant-native checkout. The protocol-led model hit the merchant willingness-to-integrate boundary at single-digit merchant counts. The merchant-led model launched with five anchor retailers already integrated. This is a positioning lesson, not a technology one. A protocol with five major retailers behind it on day one creates gravity that a protocol without them does not.
Implications by audience
Click any audience to expand the implications.
The pragmatic stack as of mid-2026 is Mastercard Agent Pay plus Visa TAP for network-attested traffic, plus ACP and UCP support for direct integration with the major AI platforms. Stripe shipped one-line ACP integration. Worldpay's MCP server is live. Commercetools has agent commerce built in. The decision is no longer "should we accept agent traffic." It is "in what order do we add protocol support, and which integrations get prioritised this quarter."
Merchants who block AI crawlers are blocking revenue. Amazon's decision to block AI crawlers cost them roughly 600 million product listings from AI search results and an 18% month-over-month drop in ChatGPT referral traffic. Walmart now captures around 20% of ChatGPT referral traffic. The directional read is that AI discoverability is the new SEO and the cost of opting out is now measurable.
Visa and Mastercard have placed parallel bets at different scopes. Visa went broad early with Intelligent Commerce. Skyfire, Nekuda, PayOS, and Ramp as pilot partners. Geographic rollout starting in Europe and the UK before Asia Pacific and LATAM. Mastercard went deep with Agent Pay and the Microsoft partnership. Both networks are building the agent identity layer themselves rather than ceding it.
The bet that requires monitoring is stablecoin rails. Coinbase x402 and Stripe MPP with Tempo and Visa are eating B2B agent-to-agent payments where neither network has clear positioning. The conversion layer (Mastercard's Multi-Token Network) is the hedge. Whether the hedge is sufficient depends on how much B2B agent volume actually materialises in the next eighteen months.
Platforms are converging on the merchant-led integration model after OpenAI's protocol-only experiment failed. Google's UCP is the working version of this approach. Anthropic's MCP, donated to the Linux Foundation, is the substrate underneath. Whichever AI platform best disambiguates the consent flow (which agent acted, on whose authority, with what permissions, across which sessions) wins the trust argument. The technical substrate is largely neutral. Identity disambiguation is the differentiator.
Processors face the question of whether agent commerce flows go through them at all. If the networks issue tokens directly to agents and merchants accept those tokens, Stripe and Adyen become the compatibility layer rather than the primary flow. Stripe's MPP positioning (agent-to-agent payments, with Tempo and Visa as design partners) is the hedge against this disintermediation. The success criterion is whether MPP captures meaningful B2B agent-to-agent volume, or whether it becomes a strategic curiosity that the company supports without scaling.
The substrate question
Web Bot Auth is the substrate on which the network and platform layers depend. Cloudflare led the work. The IETF RFC 9421 standard was developed before agentic commerce became urgent. Visa and Mastercard adopted it because issuing each network's own agent identity attestation independently was technically and politically uncompetitive. Once Web Bot Auth became the substrate, network tokens became "Web Bot Auth plus brand attestation" and protocol tokens became "Web Bot Auth plus delegation flow." Everything above the substrate is differentiation. The substrate itself is the trust assumption.
The party that controls Web Bot Auth's evolution controls the trust assumption that everything builds on. Cloudflare's positioning here is quieter than Mastercard's or Google's, and more interesting because of it. Cloudflare partnered with Microsoft, Shopify, Checkout.com, Worldpay, and Adyen on the early Web Bot Auth work. The partnerships extended to Visa and Mastercard adoption through 2025. Cloudflare's revenue from this layer is small. Cloudflare's strategic positioning from this layer is large.
The question for 2026 through 2027 is whether the standards body keeps Web Bot Auth sufficiently open that the networks and platforms have to keep competing on top of it, or whether one party captures enough of the extension surface to make the substrate effectively their substrate. The first scenario keeps the layered stack interesting and competitive. The second scenario is where the architecture starts to consolidate uncomfortably under whichever party gets there first.
The current trajectory looks like the first scenario. Watch for that to change.
How Saigar produced this Brief
The mechanism diagram above is the Brief's distinctive visual element. The Writer authors the lane data (paths, nodes, packet timing). The renderer turns it into animated SVG. Saigar produces this from a topic prompt, through the 24-stage pipeline, in roughly 14 minutes wall clock, at a per-run cost of €7.16.
The full Brief, in its native format, includes additional components not shown in this preview: a market-state line chart of agent-initiated transaction volume over the last six quarters, hyperlinked source citations with reverse "cited in" anchoring, closing statement. The full version uses dual-mode rendering (newspaper or slides) and the back-chip for cross-references between sections.
From two pipelines, to many.
v1 ships two pipelines. v2 adds four. v3 evolves the architecture from pipelines-as-primitives to pipelines-as-orchestrated. Each step is opt-in, governed by registration not rebuild.
Two pipelines, six agents, one loop
Brief and Paper. Six specialist agents including the Reflector. The cross-run learning loop. Atelier register throughout. Single-tenant on Hetzner Falkenstein.
What ships
- Brief pipeline: bi-weekly payments competitive intelligence
- Paper pipeline: long-form research articles to sagarbharambe.com
- Six specialist agents with reasoning checkpoints and learning loop
- Dual-mode Brief HTML with mechanism diagrams and back chip
- The /learning view for principal supervision
- Full audit trail and reproducibility
What's deferred
- Continuous trigger fully active (code path exists, no continuous pipelines)
- Cross-pipeline orchestration
- Multi-tenancy (any code path)
- Public sharing of artefacts
- Auto-discovery of new sources
- Multilingual output (Dutch articles)
Four more pipelines
v2 expands the pipeline catalogue. Each new pipeline is YAML registration plus optional specialist agents. The architecture absorbs them without core changes.
Regulatory monitoring
Polls EBA, ECB, AFM, DNB feeds. Scores materiality per item against principal interest signals. Material items surface as briefing cards on the Today view. Activates the v1 continuous trigger no-op.
activates continuousInvestment thesis stress-test
Adversarial debate variant. The Synthesizer's Position 2 is constrained to find the strongest disconfirming case. Output: a stress-test report with confidence intervals on the thesis.
debate variantQuarterly market wrap
Aggregates 90 days of Briefs and continuous regulatory updates into a synthesized quarterly view. Identifies trajectory shifts and emerging themes. Output: a Brief-format artefact at quarterly cadence.
aggregatingDeal / M&A memo
Specialised pipeline for transaction analysis. Comparable transactions retrieval, synergy analysis, regulatory friction assessment. Output: a structured memo.
specialist agentsCross-pipeline orchestration
The pipeline-as-primitive abstraction holds through v2. v3 introduces meta-pipelines: a pipeline whose stages are other pipelines. A regulatory event triggers a Brief; the Brief's findings trigger a Paper; the Paper's publication seeds a quarterly market wrap. The Reflector at this scale produces meta-signals: what kinds of pipeline cascades produced the highest-quality artefacts.
This is the moment when Saigar transitions from "an operator that runs research pipelines" to "an editorial system that orchestrates a body of work."
Editorial team expansion
The single-tenant constraint loosens. Saigar can be operated by a small editorial team where each member has their own voice profile, their own learning signals, their own approval gates. Cross-team signals become a possibility: the senior editor's voice constraints take precedence in shared pipelines; the staff editors learn faster because they consume the senior's signal set.
This is speculative. It would require revisiting almost every architectural decision in v1: the singleton principal, the auth boundary, the cost model, the audit trail. Not a v1 question.