Confidence-gated agent runtime

Every agent runtime
trusts the LLM blindly.
BayesCore doesn't.

Connect your tools via MCP, describe a task, and the kernel runs a multi-agent pipeline — gating every step on a belief state. Agents that know what they don't know.

PROCEED when confidence ≥ 0.72  ·  CLARIFY when thin  ·  ESCALATE to human review when below floor. The gate runs before every step — not after.

bayescore / pipeline run
Task: Research and verify whether recent LLM benchmarks measure reasoning or pattern matching
research_agentPROCEED87%
eval_agentPROCEED79%
claim_scorer_agentCLARIFY54%
summarise_agentWAITING
⚑ Pipeline paused — claim_scorer confidence 54% below commit threshold (72%). Provide additional context or run individual evals to build belief.

Connect. Describe. Trust — but verify.

BayesCore runs locally with a bundled LLM. Every pipeline step is gated before it executes — not logged after it fails.

01

Connect your tools

Add any MCP server — GitHub, Notion, Slack, Postgres, or your own. BayesCore handshakes, discovers the tool list, and registers every tool automatically. No code required.

mcp__{server_id}__tool_name
02

The kernel gates every step

Each pipeline step checks the agent's belief state before executing. Below the commit threshold, the kernel pauses and asks — rather than proceeding blindly and producing a confident wrong answer.

PROCEED · CLARIFY · ESCALATE
03

Beliefs compound across runs

Every step outcome updates the agent's Beta distribution. The kernel gets more reliable with use — and more precise about when to stop. The belief state is yours, stored locally, never sent upstream.

Beta(α, β) per agent · forgetting factor

Three decisions. One rule. No guessing.

Before every agent step, the kernel evaluates the belief state and makes exactly one of three decisions. This runs regardless of which pipeline, which tools, or which task.

Gate evaluation — runs before every step
PROCEED
≥ 72%

Agent's posterior mean clears the commit threshold. Step executes. Output flows to the next step.

CLARIFY
40 – 72%

Evidence is thin. The kernel pauses and surfaces a targeted question — not a form, one question — before continuing.

ESCALATE
< 40%

Confidence is below the floor. The pipeline halts and surfaces to human review. The agent does not act.

Every other runtime skips this entirely
BayesCore — gatedLangChain — no gateCrewAI — no gateAutoGen — no gateClaude Projects — no gateGPT Actions — no gate

Calibrated output for any task that matters.

Connect the tools you already use. Describe what you need. The kernel handles the rest — and stops when it shouldn't proceed.

Research & analysis

Verified research pipelines

Connect web search, academic sources, or internal knowledge bases. The kernel researches, verifies each claim adversarially, and stops when evidence is insufficient — not when it runs out of tokens.

Claim verification

IS(claim, supported)

Any factual claim can be evaluated as a testable hypothesis. The kernel scores it with calibrated confidence — not an opinion — and surfaces exactly what evidence is missing.

Cross-app workflows

MCP-connected pipelines

Wire GitHub, Notion, Slack, or any MCP server into a pipeline. Agents use your tools, verify their own outputs at each step, and surface uncertainty before acting on your data.

Document & output review

Calibrated output evaluation

Any structured output — report, spec, proposal, analysis — can be evaluated against its own implied criteria. The kernel extracts the standard from the document itself, not from an external rubric.


Not a confidence score.
A posterior probability.

BayesCore's belief state is grounded in Bayesian probability theory. Each agent maintains a Beta(α, β) conjugate prior — the correct distribution for binary outcomes. α increments on successful steps, β on failures. The posterior mean α/(α+β) is the confidence value the gate evaluates. Forgetting factor 0.9 shrinks pseudocounts toward the uninformative prior across sessions.

The two-pass adversarial verification follows Cox (1946) — probability as the only consistent logic of uncertainty — and Pearl's do-calculus for structuring evidence. The commit threshold (0.72) and clarify floor (0.40) are named constants, not magic numbers. Every decision is auditable.

Beta(α, β) · posterior mean = α / (α + β) · forgetting factor = 0.9

Bayes, 1763Cox, 1946Pearl, 1988Jaynes, 2003

Agents that know
what they don't know.

Download the BayesCore desktop app. Connect your MCPs. Run your first confidence-gated pipeline in minutes.

Compare Free vs Standard · See comparisons

Live on Product Hunt
BayesCore

BayesCore

Confidence-gated agent runtime. Agents that know what they don't know.

Check it out on Product Hunt →