The Core Problem

You are the single
point of failure

Every ticket, every goal, every evaluation begins with you knowing what to ask. You are the sensor. Sensors have blind spots. GEDS changes the primitive operation from evaluation-on-request to continuous structural inference — finding what you didn't know to look for.

In code, you don't know what good looks like — so the ticket "this architecture will bottleneck the GONS routing layer at scale" never gets written.

In sales, you don't know what good looks like — so the ticket "your offer framing is burying the outcome clients actually want to buy" never gets written.

Not because the system can't execute them. Because nobody knew to write them.

The Mechanism

Two processes. Two clocks.

The LLM handles communication on a fast clock. The compression engine runs structural inference on a slow clock. GEDS bridges them.

Fast Clock · Reactive

LLM Layer

Handles communication and ticket generation
Executes tasks through GOMS
Interfaces with you through GONS
Translates structural inferences into language
Fires on prompt — strictly reactive

Slow Clock · Continuous

Compression Engine

Accumulates every output, ticket, revision, outcome
Compresses into an evolving structural model
Tracks where execution flows vs. hits friction
Detects gaps between what's worked on and what matters
Runs continuously — never waits for a prompt

Components

Three components of the compression engine

01 — Embedding Model · Operational History as Geometry

Every ticket, output, revision, and outcome gets embedded — not as retrievable text but as a geometric structure where similar patterns cluster and divergent ones separate. The two clusters above emerge naturally — code patterns and sales patterns — without being labeled. You see the shape of what's happening, not just the sequence of events.

02 — Sparse Autoencoder · Finding the Latent Variables

A network trained to find the minimum set of latent variables that explain variance in your operational data. The outer ring is all the noise — every ticket, output, revision. The core is the 5–20 underlying factors that actually predict most of what happens. When a new unexplained factor emerges, that's the signal that surfaces upward to GEDS.

03 — GEDS Bridge · Inference to Action

Takes what the compression engine surfaces, runs it through the LLM for articulation, and generates the unprompted ticket — the reframing you didn't know to ask for. The LLM doesn't do the structural inference. It translates inference into something actionable. You evaluate the inference. You don't have to generate it. You move from sensor to judge.

Where it fits

GEDS in the G Stack

Every other system in the stack executes within the frame it was given. GEDS is the only system whose job is to question whether the frame is right.

GEDS — Evaluation & Decision

Center of the stack. Compression engine plus bridge. Generates what nobody asked for.

GOMS — Orchestration & Execution

Executes tickets via agents. Does the work.

GONS — Operations Nervous System

Routes, coordinates, communicates. Interfaces with you directly.

GAMS — Allocation Management

Prioritizes work. Allocates time and agent effort.

GIMS / GEMS — Memory & Artifacts

Source of truth. File and output storage.

GRAMS — Market Interface

Handles economic interactions: proposals, contracts, labor exchanges.

The key shift: every other system executes within whatever frame was given. GEDS is the only system whose job is to question whether the frame is right. You move from sensor — knowing what to ask — to judge — deciding whether what was inferred is worth acting on.

Unprompted Inferences

awaiting review

Latent Factors

+2 new this week

Structural Gaps

high confidence

Acceptance Rate

71%

last 30 days

Unprompted Inferences 3 pending

Codebase · High 2h ago

GONS routing layer will bottleneck at scale — architectural decision needed now

Evidence: 7 of last 9 routing revisions touched the same two modules. Friction pattern suggests a structural assumption that won't hold past ~50 concurrent threads.

Sales · Medium 6h ago

Offer framing is burying the outcome clients actually want to buy

Evidence: 3 recent outreach drafts led with process and capability. High-converting patterns lead with the specific problem being removed from the client's life.

Time Allocation · Low 1d ago

High-leverage work is being deferred for visible but low-impact tasks

Evidence: GAMS logs show 68% of last week on execution tasks. GEDS-flagged strategic work deferred 4 times across 3 days.

Latent Factor Map 11 tracked

Scope underestimation — 84%

Routing abstraction debt — 71%

Strategic deferral — 68%

Sales framing — 61%

Verification rate — 44%

Context exhaustion — 38%

Structural Drift — 14 Days 2 gaps widening

Gap between the compression model's predicted trajectory and actual output direction.

Compression Log live

09:14:32

New factor emerging: GOMS_TICKET_CHURN — same tickets revised 2x+ before completion. May indicate upstream specification problem.

08:51:07

Compression model updated with 23 new GOMS outputs. Routing abstraction debt confidence 64% → 71%.

07:30:18

Sales framing inference generated. Evidence threshold met (3 corroborating patterns). Queued for approval.

yesterday

Structural gap: TIME_ALLOCATION diverging from stated priorities — 4th consecutive day.

2d ago

GONS routing inference accepted. Ticket ARCH-0041 queued in GOMS. Tracking outcome for model calibration.

Approval Queue — Awaiting Your Judgment 3 pending

GONS routing bottleneck — Convert inference to GOMS architectural review ticket. Est. 2–4 hours. Will pause two dependent tickets.

Sales framing — Rewrite outreach template to lead with client problem removal rather than system capability. Generate 2 variant drafts.

Time allocation drift — Reschedule GRAMS schema work to tomorrow AM. Block 3 hours. Flag to GAMS as protected time.

Construction

What it's
made of

Not a single model. A stack of well-understood components arranged around a new objective function. Each piece exists. The arrangement is the invention.

A system that watches everything that happens in the operational stack, maintains a set of competing structural explanations for why things happen the way they do, and fires an alert when one of those explanations suddenly gets better at predicting the record — without being asked. The alert becomes a ticket. You judge whether the inference is real. That judgment feeds back into which explanations survive.

Materials

Five components in build order

01 — GUTS · Structured Event Log

Before any compression is possible you need a precise event log — not conversation history, not text blobs. Typed events with schema. Every ticket created, revised, or abandoned. Every inference accepted or rejected. Every time block allocated or displaced. Every sales attempt and outcome. Every code change and revert.

The schema is the hardest design decision in the whole system. Vague events produce vague latent factors. The more precisely you define what counts as an event and what its attributes are, the more useful the compression becomes.

Storage: SQLite — append-only, fixed schema, modest data volume, handles concurrent reads. A handful of tables: events, inferences, model_scores, feedback.

02 — Sparse Autoencoder · The Compression Engine

A standard autoencoder compresses input to a bottleneck and reconstructs it. A sparse autoencoder adds a constraint: most hidden neurons must be inactive at any given time. That sparsity pressure forces interpretable latent features rather than distributed encodings that work mathematically but mean nothing to a human.

The diagram shows it: many input nodes → small bottleneck → sparse hidden layer where only a few neurons activate per event. Those active neurons are the latent factors GEDS tracks — the underlying variables that explain why your operational stack behaves the way it does.

Trained in: PyTorch. Once trained, the decoder is discarded. Only the encoder is used to embed new events as they arrive.

Embeddings stored in: ChromaDB or LanceDB — both embeddable, no separate server. SQLite blobs work fine at V1 scale.

Theoretical grounding: The objective function the autoencoder serves — rewarding compression progress rather than compression achieved — is formalized in Schmidhuber's 2009 paper Driven by Compression Progress. The core argument is that interestingness is not raw novelty but the rate of improvement in a system's ability to compress its observations. That is the reward signal GEDS is built around. arxiv.org/pdf/0812.4360

03 — Compression Progress Detector · The GEDS Signal

This is what makes GEDS different from a dashboard. A dashboard tells you what happened. The detector tells you when a new explanation suddenly accounts for a lot of previously scattered behavior.

The chart above shows it: two competing models improving slowly over time, then one suddenly jumps — a new latent factor collapses a lot of previously uncompressed variance. That jump is the signal. GEDS fires.

The metric: delta compression per unit of model complexity. Did this new explanation make a lot of scattered stuff simpler without adding too much machinery? Stated technically: did reconstruction error drop significantly relative to the increase in model parameters?

In V1 this is not classical RL. It's a Bayesian scoring function. Each competing model has a score. New events raise or lower that score based on whether the model predicted them. A significant score jump triggers a surface event.

04 — Anti-Psychosis Validator · Before Anything Surfaces

Compression engines can go wrong. Conspiracy theories are compression engines. Delusions are compression engines. "Everything is about X" is often a high-compression, low-truth model. Before any inference reaches you it must pass five gates.

Improves Prediction

Must make a falsifiable prediction about something not yet in the trace. If it can't be wrong, it isn't an inference.

Survives Contradiction

Must account for the times the pattern didn't hold. A model that ignores counterevidence is a delusion.

Produces Testable Consequences

Must imply a concrete action whose outcome will confirm or disconfirm it. No untestable inferences reach the queue.

Decays If It Stops Predicting

Models that were right but stop predicting get retired. No permanent conclusions. Confidence must be earned continuously.

05 — LLM Articulation Layer · Translation Only

The last component to touch an inference, and the narrowest. Its job is strictly translation: take what the compression engine surfaced and the validator approved, and render it as a human-readable unprompted ticket with evidence attached.

The LLM does not evaluate whether the inference is true. It does not add interpretation. It does not generate its own structural analysis. It renders. This is the division that keeps the "what's the point" organ outside the language model, where it belongs.

The LLM also handles the second-pass jobs: explaining the evidence in plain language, generating the GOMS ticket from an accepted inference, writing the sales draft, producing the code review. All of that is articulation work. None of it is compression work.

API cost stays low because the LLM only fires when a compression passes the validator threshold — not on every event, not on every cycle of the scoring loop.

The Hard Part

Event schema design

The schema is the most consequential design decision in the system. Vague events produce vague latent factors. These are the event types the V1 trace needs to capture.

Event Type	Key Attributes	Why it matters
ticket_created	domain, source, scope_estimate	Reveals what kind of work is entering the system and from where
ticket_revised	ticket_id, revision_num, modules_touched	Repeated revision of same modules → abstraction debt signal
ticket_abandoned	ticket_id, reason_tag, time_spent	Abandoned work reveals where scope estimation failed
time_block_allocated	domain, task_type, strategic_flag	Tracks what categories of work are actually receiving time
time_block_displaced	original_task, displaced_by, reason	Displacement patterns reveal the real priority ordering
inference_surfaced	model_id, delta_score, domain, evidence_count	Creates a record of what GEDS believed and when
inference_accepted	inference_id, action_taken	The central training signal — what judgments were made
inference_rejected	inference_id, rejection_reason	Rejection reasons calibrate the validator thresholds
sales_attempt	framing_type, channel, outcome, response_lag	Connects offer framing to conversion — the signal for sales latent factors
code_changed	modules, change_type, preceded_by_ticket	Changes without preceding tickets reveal unplanned work patterns

Build Order

V1: one domain, one model, one inference

The minimum viable version is narrow by design. Prove the architecture works in one domain before building the rest.

Define the event schema for time allocation only

Time allocation is the most tractable domain. Events are measurable, stakes of a wrong inference are low, and the pattern (strategic deferral under execution load) is already visible. Build GUTS for this domain only and log two weeks of real operational data before touching anything else.

Train one sparse autoencoder on the trace

Two weeks of time allocation events should be enough to find 3–5 latent factors. Train in PyTorch. Store embeddings in SQLite blobs initially. Read the latent factors manually first — before adding any scoring logic — to verify they're meaningful rather than artifacts.

Implement the Bayesian scoring loop

For each latent factor, maintain a score that updates when new events arrive. Define a delta threshold for "compression progress detected." Run it on the historical trace first — did it fire at the right moments looking backward? Calibrate the threshold before running live.

Build the validator and surface one inference

Implement the five gates as a simple rule-based filter. Let the system surface one inference. Evaluate it manually. Was it right? Did the evidence actually support the conclusion? Does it pass all five gates in practice, not just in theory? This is the experiment that tells you if the architecture has legs.

Wire the feedback loop and let it run

Accept or reject the inference. Track whether accepting it improved downstream prediction. Let the scoring loop update from the feedback. Run for another two weeks. The capability that doesn't exist anywhere else is this loop — the system getting better at finding the real question every time you tell it whether its last guess was right.

The thing that makes this worth building isn't the individual inferences. It's that the system gets better at finding the real question every time you tell it whether its last guess was right. That feedback loop is the product. Everything else is scaffolding.

You are the singlepoint of failure

What it'smade of

You are the single
point of failure

What it's
made of