Two locked warehouse perimeters inside a larger enterprise estate, with disconnected knowledge sources - wikis, CRM notes, JIRA tickets, PDFs, spreadsheets - floating outside the walls, illustrating why a warehouse-native semantic layer cannot span the full data estate.

Why Snowflake and Databricks Can't Be Your Enterprise Semantic Layer

A critical look at warehouse-native semantic layers - and why crawling your full data estate is a structurally different category of problem.

If you have spent the last twelve months evaluating semantic layers for enterprise AI, you have almost certainly heard a version of this pitch: "You already use Snowflake. Just turn on Cortex Analyst. You already use Databricks. Just turn on Genie. You don't need another product."

It is a clean story. It is also wrong in a way that costs money once you scale past a pilot.

This post walks through, with primary sources, what Snowflake's and Databricks' semantic layer offerings actually do, where they stop, and why a cross-estate semantic layer is a structurally different product. The conclusion is not that warehouse vendors are doing bad work. It is that semantic intelligence and warehouse storage are two different jobs, and trying to merge them runs into a wall the moment your business has more than one system of record.

The setup: why everyone is suddenly building a "semantic layer"

The semantic layer is not a new idea. It is the layer that sits between physical data and a human question, and it answers things like "what does revenue mean here, what tables hold it, how is it filtered, which join paths are valid, who is allowed to see it."

For the last two decades, this lived inside BI tools (Looker LookML, Tableau data models, MicroStrategy attributes). It was a static modelling step. You wrote it once. You maintained it forever.

Then enterprise generative AI showed up, and the rules changed. AI agents do not navigate a curated dashboard. They generate SQL on the fly, retrieve documents, summarise meetings, and answer questions that nobody anticipated. Without a shared layer of meaning, every model hallucinates a slightly different version of "revenue", and every copilot becomes another source of confidently wrong answers.

  • $3.1T - estimated annual global cost of data silos (IDC / McKinsey).
  • $12.9M - average annual loss per enterprise from poor data quality (Gartner).
  • 897 - average applications per enterprise (MuleSoft 2025 Connectivity Benchmark).
  • 68% of enterprise data goes unanalysed because of silos (IBM).

This is why every infrastructure vendor is now adding "semantic layer" to their roadmap. Snowflake launched Cortex Analyst. Databricks launched AI/BI Genie and Unity Catalog Business Semantics. dbt has its Semantic Layer. The warehouses have noticed the gravity of the problem.

The question is whether they can solve it from inside the warehouse.

What Snowflake Cortex Analyst actually does (and where it stops)

Cortex Analyst is Snowflake's text-to-SQL service. It uses a YAML semantic model to translate natural-language questions into SQL that runs on Snowflake tables. It is genuinely useful for teams whose entire analytical surface lives inside Snowflake.

But the documentation is unambiguous about its scope.

  • It only sees data inside Snowflake. Snowflake Intelligence "is confined to data within the Snowflake environment. It can't directly query external databases or data sources outside Snowflake without first ingesting that data" (Flexera, 2026). Federation and Iceberg help, but they federate tables, not meaning. They do not crawl Confluence, they do not read Salesforce notes, they do not parse PDFs.
  • The semantic model is a YAML file you maintain by hand. Even with Snowsight's wizard, the process can take weeks or months for a complex environment because data teams have to reverse-engineer relationships, metrics, and business terms table by table.
  • Snowflake itself has admitted the boundary problem. In 2025, Snowflake launched the Open Semantic Interchange (OSI) initiative - an explicit attempt to create a standard that lets semantic models move between platforms. You do not launch an interchange standard if your in-house semantic layer is sufficient on its own. The launch is, in effect, a public acknowledgement that the semantic layer does not belong inside any single warehouse.

To Snowflake's credit, they are honest about this. The product is called "Cortex Analyst", not "Cortex Enterprise Intelligence". It does what it says.

What Databricks Genie actually does (and where it stops)

Databricks AI/BI Genie is the conversational analytics surface that sits on top of Unity Catalog. Like Cortex Analyst, it generates SQL from natural language. Like Cortex Analyst, it has hard scope boundaries that the documentation states explicitly.

  • Genie is structured-data only. Databricks' own documentation: "Genie works with structured data only. It cannot answer questions about unstructured data such as PDFs, Word documents, or other file-based content." For unstructured content, you switch to a different surface and connect external sources separately. There is no single graph that knows both.
  • There is a 30-table cap per Genie Space. The official limit: "You can add up to 30 tables or views to a Genie Space." Most enterprise data estates have hundreds to thousands of tables that are conceptually relevant to a single business question. The architecture forces you to fragment the semantic model into narrow, hand-curated spaces.
  • Unity Catalog governance ends at the Databricks boundary. Atlan's analysis: "Unity Catalog governs well within Databricks; governance, lineage, and context don't travel beyond its perimeter. Lakehouse Federation partially bridges the gap but is read-only, performance-limited, and suited for ad hoc use only." For organisations running hybrid or multi-platform stacks, this creates a persistent governance blind spot.
  • UC Metric Views are tied to UC-registered objects. The September 2025 GA announcement positions Unity Catalog Business Semantics as a unified semantic foundation - and that is true within Databricks. Outside it, the metric definitions are once again invisible.

The honest read: Genie and UC Business Semantics are excellent if your enterprise data architecture is Databricks-and-only-Databricks. The honest follow-up question: how many enterprises actually look like that?

The walled garden problem, in one diagram

Diagram titled 'Warehouse-native semantic layers stop at the warehouse boundary'. Two large boxes labelled Snowflake Perimeter and Databricks Perimeter each contain four sample tables and an internal semantic model - Cortex Semantic Model in YAML on the left and UC Metric Views in SQL on the right. Below each, three bullet points list scope limitations such as cannot crawl Confluence, 30-table cap per Genie Space, and governance ending at the boundary. An orange strip at the bottom lists eight enterprise sources outside the walls including Confluence wikis, Salesforce notes, JIRA tickets, SharePoint policies, Excel sheets, on-prem Postgres or Oracle databases, email threads, and MongoDB.

A warehouse-native semantic layer can only know the things that have been ingested into the warehouse. The warehouse, by construction, is a destination for structured, copied, batched data. The places where business meaning actually lives - the wiki where the analytics team wrote down what "active customer" means, the Salesforce account note where the sales rep explained why a deal slipped, the JIRA ticket that documents a one-off backfill that broke a metric for two weeks, the policy PDF that defines what counts as a regulated transaction - none of those things are tables. None of them get ingested. None of them are visible.

Federation and zero-copy access (Iceberg, Lakehouse Federation, Snowflake's external tables) help with data movement. They do not help with meaning. A federated table from Postgres into Snowflake still arrives without its semantics: no glossary, no documentation, no relationships, no usage history.

The semantic layer is a graph that spans your whole estate, and the warehouse is a node in that graph, not the container for it.

What actually lives outside the warehouse

A useful sanity check: where does the average enterprise's business meaning actually sit? The MuleSoft 2025 Connectivity Benchmark Report puts the average enterprise on around 897 applications, with 45% of organisations managing over 1,000. Eighty-two percent of enterprises report data silos disrupting workflows, and 68% of enterprise data goes unanalysed because of those silos.

Read those numbers next to the Genie 30-table cap, and the gap is not subtle.

The semantic context for a typical enterprise question is scattered across:

  • Warehouses and lakehouses (Snowflake, Databricks, BigQuery, Redshift) for cleaned analytical tables.
  • Transactional databases (Postgres, MySQL, Oracle, MongoDB) for live operational state.
  • Knowledge platforms (Confluence, Notion, SharePoint, Google Drive) for definitions, methodology, and process documentation.
  • Business applications (Salesforce, HubSpot, JIRA, Zendesk) for the customer story, deal notes, and ticket history.
  • File systems (Excel, CSVs, PDFs, contracts) where finance and operations still actually live.
  • Communication archives (email threads, Slack channels) where decisions get made and metric definitions get clarified.

A semantic layer that sees only the warehouse misses the majority of this. A copilot built on top of a warehouse-only semantic layer will confidently answer questions using a partial view of reality - which is arguably worse than refusing to answer at all.

The "why not just upsell?" objection

The most common counterpoint goes like this: "Fine, but Snowflake and Databricks will eventually add connectors. They'll crawl Confluence too. The warehouse will become the cross-estate semantic layer over time."

There are three reasons that argument does not hold up under pressure.

1. Architectural conflict of interest

A warehouse vendor's commercial incentive is to maximise data residence inside the warehouse. Their pricing is consumption-based. Pulling more meaning into the platform is good for them. Telling a customer "keep your operational data in Postgres, your wiki in Confluence, your CRM in Salesforce - and we will reason across all of it without ingesting it" directly works against that incentive. Vendors do not build products that contradict their own monetisation model.

2. The semantic layer is a horizontal product. Warehouses are vertical platforms.

A horizontal product needs to treat every source as a peer. A vertical platform needs to treat its own storage as the centre of the universe. Those two architectures look almost the same on a slide, and they diverge sharply once you add the second source. This is the same reason Looker won the BI semantic layer category against warehouse-bundled BI tools a decade ago, and why dbt won the transformation category against warehouse-bundled ETL: a horizontal layer that treats sources symmetrically beats a vertical product with a home-team bias.

3. Open Semantic Interchange tells you where the industry is going

When Snowflake itself sponsors an interchange standard, the message is that no warehouse expects to own the semantic layer end to end. The endgame is portability, not lock-in.

This is also why "just turn on Cortex / just turn on Genie" stops being a credible upsell once an enterprise has more than one major data platform. Which, for any enterprise of meaningful size, is approximately always.

What a cross-estate semantic layer needs to do

A semantic layer that actually serves enterprise AI has to do six things that warehouse-native layers structurally cannot.

1. Crawl every source where meaning lives, not just structured tables

That means treating a Confluence page, a Salesforce custom field, a JIRA ticket, an Excel sheet, and a Snowflake table as first-class citizens of the same graph. Each contributes a piece of business context: a definition, a relationship, a constraint, a usage example.

2. Maintain itself

Enterprise data changes constantly. Tables get added, columns get renamed, metric definitions get refined in a wiki edit nobody told the data team about. A semantic layer that requires a human to author and update YAML files for each change becomes stale within weeks. The layer has to detect drift on its own and update the graph autonomously.

3. Reason across systems, not within one

Real questions cross boundaries. "How many of our top-tier customers have an open critical ticket and have not had a renewal conversation in 90 days?" That question touches a CRM (top-tier), a ticketing system (critical ticket), an activity log (renewal conversation), and a calendar of touchpoints. No warehouse table holds all of that. A cross-estate layer joins meaning, not just rows.

4. Stay governed as it grows

Every entity in the graph has owners, access scope, lineage, and policy. The layer has to enforce those things consistently across systems, not delegate enforcement to whichever warehouse a particular table happens to live in.

5. Be queryable by every AI surface, not just one

The same semantic graph should power text-to-SQL, copilots, BI tools, embedded analytics, and agent workflows. If the layer is fused to a single product (Cortex, Genie, Looker), then every other surface starts inventing its own version of the truth.

6. Be explainable

Every answer an AI agent gives must trace back through the graph: intent → semantic concept → resolved metric → SQL → source data. Without that trace, regulated industries cannot deploy AI in production.

How Colrows fits

Architecture diagram of Colrows as a cross-estate semantic layer. Enterprise sources on the left - warehouses (Snowflake, Databricks, BigQuery, Redshift), transactional databases (Postgres, MySQL, Oracle, MongoDB), knowledge and docs (Confluence, Notion, SharePoint, Google Drive), and business apps (Salesforce, HubSpot, JIRA, Excel sheets) - flow into a central Colrows Autonomous Semantic Layer card containing a semantic graph, metrics and KPIs, vector embeddings, policies and rules, causal chains, and multi-scope context. The card's autonomous self-maintaining footer powers six AI surfaces on the right: AI analyst, enterprise copilots, text-to-SQL, BI and dashboards, workflow agents, and embedded search.

Colrows was built specifically as a horizontal, cross-estate semantic layer. It crawls the warehouses, transactional databases, knowledge platforms, business applications, and file systems where business meaning actually lives, and constructs a single autonomous semantic graph that powers every AI surface downstream.

A few things make the architecture different from a warehouse-native semantic layer.

  • Autonomous, not hand-coded. Colrows builds and maintains the semantic graph continuously through AI agents. There is no YAML file to author and re-author every time a column is renamed or a definition is updated in Confluence. The graph evolves on its own.
  • Multi-scope semantics. A definition is not a single global object. It exists at four levels: global, datastore, persona, and user. The same word ("customer") can resolve correctly in finance, sales, and product contexts without one team having to overwrite another team's meaning.
  • Multi-vector embeddings. Each concept is represented by three vectors (definition, usage, combined), which produces dramatically more accurate retrieval than the standard single-embedding approach used by most retrieval-augmented systems.
  • Compile-then-execute pipeline. The graph is queried by a hybrid engine that combines SQL execution, vector search, and LLM inference. Each part of a question gets routed to the right reasoning mode, with cost and governance guardrails applied before execution rather than after.
  • No rip-and-replace. Colrows sits on top of what you already have. Snowflake stays Snowflake. Databricks stays Databricks. Confluence stays Confluence. Colrows does not move your data - it learns the meaning of your data and exposes that meaning, governed, to whatever AI surface needs it.
In pilot deployments, the cross-estate approach has translated to 75% fewer data requests routed through analyst teams, 31% higher conversion on AI-assisted sales workflows, and 340+ IT hours saved per month.

A fair counter to our own argument

It would be dishonest to write this post without acknowledging the cases where a warehouse-native semantic layer is sufficient.

If your entire analytical estate genuinely lives in one platform, if your business definitions are simple enough to fit in a YAML file, if your AI use cases are confined to BI-style questions over structured tables, and if your data team has the bandwidth to author and maintain semantic models by hand - then Cortex Analyst or Genie will solve your problem. Buying a separate cross-estate semantic layer in that situation is overkill.

The point is not that those products are bad. The point is that the moment any one of those conditions breaks - and at enterprise scale they almost always break - the warehouse-native approach stops being a complete answer. At that point, you do not need a better warehouse. You need a layer that sees across them.

The bottom line

The semantic layer is the most strategic piece of enterprise AI infrastructure being built right now. Gartner's 2025 outlook predicts that by 2027, more than 70% of enterprises will deploy a semantic layer as part of their AI stack.

The question is not whether you need one. The question is whether you build it inside one of your warehouses (and accept that everything outside that warehouse stays semantically dark) or whether you build it as a horizontal layer that sees the whole estate.

Snowflake and Databricks are doing important work, and their semantic layers will keep getting better at what they do. What they do is not, however, the same job as a cross-estate semantic layer for enterprise AI. Treating them as if they were is what produces a copilot that answers fluent, confident, and partially wrong questions for the next eighteen months.

Colrows exists to solve the cross-estate version of this problem. If you are early in evaluating semantic layers for your AI stack and want to see what the cross-estate architecture looks like in practice, we are happy to walk through it on a live demo.

Ship AI you can trust enough to put in production.