Core concepts
A short, opinionated tour of the vocabulary Colrows uses everywhere - in the UI, in the API, in your audit logs. If you read one page in these docs, read this one.
Compile-then-execute
Colrows treats every request - whether it arrives as natural language, SQL, or an agent tool call - as a program to be compiled, not as a string to be templated. The compiler runs four deterministic stages:
Each stage is isolated. Parsing produces an AST, never SQL. The semantic control plane resolves that AST against the graph and emits a logical plan only after every binding, every constraint, and every policy has been validated. The SQL engine then performs cost-based physical planning. The dialect layer is the last thing to touch the request.
The benefit of this strict staging is straightforward: correctness is proven before execution. A request that resolves ambiguously fails at compile time, with an explainable error - not at the warehouse, after a 4-minute scan.
The semantic graph
The Consensus semantic layer is implemented as a typed, versioned, directed graph. Every meaningful business primitive - entities, metrics, events, concepts, definitions, examples, dimensions, datasets, columns, constraints, policies, personas, scopes - is a first-class node. Edges encode semantic relationships such as defined_by, derived_from, triggers, constrained_by, and governed_by.
Three properties of the graph are load-bearing:
- Typed - every node has a known kind, and reasoning is structural, not string-based.
- Versioned - changes never overwrite prior definitions. Each change creates a new semantic state, which makes point-in-time reproducibility a free property of the system.
- Multi-scope - the same graph holds
globaldefinitions,datastore-specific ones,personaoverrides, anduser-personalized context, with explicit precedence rules.
Metrics as state, not queries
In most platforms a metric is a SQL fragment: a SUM(...) FROM ... WHERE ... stored under a friendly name. Colrows treats a metric as derived semantic state - a continuously interpretable representation of business reality that any agent or query can reason over.
A metric like Net Revenue doesn't merely encode how to compute a number. It encodes:
- Business meaning - what Net Revenue is, and how it differs from Gross Revenue or Bookings.
- Valid grain - the level at which the metric is well-defined (per order, per invoice, per customer per day).
- Dependencies - which entities, events, and other metrics contribute to its value.
- Constraints - rules that govern how the metric can be filtered, grouped, or compared.
- Downstream impact - which dashboards, agents, or signals rely on it.
The practical effect: when an agent observes that Net Revenue dropped, it can reason semantically - distinguishing volume-driven decline from refund-driven erosion - because those relationships are explicit in the metric's state, not buried in a CTE.
Join path proof
When a metric references entities across multiple datasets, Colrows must prove - not guess - that a deterministic join path exists. Joins are solved as a constrained graph traversal over the semantic graph, with three kinds of pruning:
- Paths that violate declared grain are discarded.
- Paths that introduce cardinality expansion beyond allowed thresholds are pruned.
- Cycles are eliminated using visited-state tracking with relationship-type awareness.
If multiple valid paths exist, a deterministic ranking heuristic prioritizes minimal hop count, declared canonical relationships, and explicit anchor definitions. Ambiguity causes compilation to fail. No silent guessing - ever.
The single largest source of bad numbers in enterprise BI is the silent join - the warehouse cheerfully runs a query against a relationship the analyst didn't intend. Compile-time proof is the only way to make that class of error unreachable.
Multi-vector embeddings
Colrows does not represent a concept with a single embedding. Every concept carries up to three vectors:
- Definition vector - derived from the canonical, governed definition.
- Usage vector - derived from how the concept is used in real queries, alerts, and dashboards over time.
- Combined vector - a weighted blend that improves recall when natural language drifts (e.g., "lapse" vs. "churn") while still grounding to canonical meaning.
Vectors are used for candidate identification; structural reasoning makes the final call. Embeddings are never the source of truth.
Compile-time governance
Most data platforms enforce policy after a query has been generated - by masking columns, filtering rows, or denying results. Colrows enforces policy at compile time by shaping the allowed subgraph for each persona before any plan is produced.
If a metric depends on a node outside the persona's allowed scope, resolution fails - not at the warehouse, but during compilation. There is no way to "smuggle" a column past the planner. Audit becomes a side effect of normal execution: every node visited, every edge traversed, every constraint applied is captured in a structured trace that survives forever.
Autonomous maintenance
The semantic graph is maintained by a coordinated set of background agents:
- Discovery agents ingest schemas, metadata, and documentation, identifying candidate entities, events, metrics, and relationships.
- Architecture agents validate grain, dependencies, and constraints - refusing to publish definitions that violate business logic.
- Learning agents observe how humans and AI systems use the graph in practice and refine definitions, examples, and synonyms accordingly.
- Monitoring agents detect semantic drift using statistical fingerprinting of column distributions, structural diffing of dataset nodes, and hybrid vector/structural equivalence analysis.
Point-in-time reproducibility
Because the graph is versioned and execution traces capture the exact semantic state used at compile time, any historical query can be re-executed with the definitions, policies, and join paths that were active at that moment. This is non-negotiable for regulators, but it's also useful for engineers debugging a number that "moved" between Monday and Wednesday.
Vocabulary cheat sheet
| Term | What it means in Colrows |
|---|---|
| Concept | A typed business primitive - entity, metric, event, definition. |
| Anchor | A binding from a concept to a physical column or expression. |
| Scope | The slice of the graph a request is allowed to traverse - global / datastore / persona / user. |
| Persona | A first-class graph node representing a role with its own scope and policy set. |
| Constraint | A formal predicate attached to a node - grain, time window, cardinality, RBAC, ABAC. |
| Plan | The dialect-agnostic logical tree produced after semantic resolution. |
| Trace | The structured audit record of a single compile-then-execute run. |