The Most Important Work Happens Before the AI Touches Anything
I spent 20 years in data before anyone used the word "agent" in a sentence about analytics. And in all that time, the lesson that survived every platform shift, every new tool, every vendor pitch is the same one Ralph Kimball was teaching in the 1990s.
Get the data model right. Everything else follows from that.
Snowflake Intelligence is the latest and most compelling expression of this idea. It is conversational analytics — an AI agent that sits on top of your Snowflake data, takes natural language questions from business users, and returns answers grounded in governed, well-modeled data. No SQL required. No dashboard to navigate. Just a question and an answer.
But the reason it works (when it works well) has nothing to do with the LLM. It has everything to do with what's underneath it: a semantic layer that tells the agent what the data means, and an instruction layer that tells the agent how to behave.
If you've done dimensional modeling, those two layers will feel familiar. Facts. Dimensions. Measures. Metrics. Business rules. Naming conventions. The same principles Kimball outlined in The Data Warehouse Toolkit are exactly what make a Snowflake Intelligence agent trustworthy.
This article walks through both layers — the semantic and the agent — and connects them back to the modeling discipline that makes them work. Along the way, I'll share the practical tips I've learned standing up these systems in real environments.
Kimball's Blueprint: Why Dimensional Modeling Still Matters
Ralph Kimball didn't invent the concepts of facts and dimensions from scratch. But he did something more important — he made them practical. His 1996 book The Data Warehouse Toolkit gave data teams a repeatable methodology for organizing data around business processes. Facts are the numeric measurements. Dimensions are the context around those measurements. A star schema keeps them clean. A snowflake schema normalizes the dimension hierarchies when the complexity warrants it.
The terminology sounds dated to some people now. It isn't. Every modern analytics tool — whether it calls them "measures" or "metrics" or "KPIs" — is working with the same underlying concepts.
This is the part that connects directly to what Snowflake Intelligence is doing today. The reason an AI agent can answer a natural language question about your data is that someone — a data engineer, an analytics engineer, a modeler — already encoded what "revenue" means, what "region" refers to, and how time is structured in the model.
The agent doesn't figure this out on its own. It reads a semantic definition that you wrote. And that semantic definition is, in all the ways that matter, a dimensional model.
Kimball publishes The Data Warehouse Toolkit. Star schemas, fact tables, dimension tables, conformed dimensions. The vocabulary that would define a generation of analytics.
The cloud data warehouse era begins. Snowflake, BigQuery, Redshift move data to elastic compute. The modeling discipline gets overshadowed by "just query the lake."
Semantic layers resurface — dbt metrics, Cube, AtScale. The industry remembers that raw data without meaning is just noise. Kimball's ideas re-enter the conversation under new names.
Snowflake releases Semantic Views and Intelligence. The dimensional model becomes the interface between an AI agent and your data warehouse. Kimball's blueprint, realized in YAML.
The Semantic Layer: Facts, Dimensions, Measures, and Metrics
In Snowflake Intelligence, the semantic layer is where you define what your data means in business terms. This is done through Semantic Views — YAML-based objects stored directly in your Snowflake schema that describe the entities in your data, the relationships between them, and the calculations that matter to the business.
If you've built a Kimball-style dimensional model, the mapping is almost one-to-one.
| Kimball Concept | Snowflake Semantic View | What It Does |
|---|---|---|
| Fact | facts: |
Numeric columns at the grain of the table. Revenue, cost, quantity. The raw building blocks of aggregation. |
| Dimension | dimensions: |
Categorical attributes that provide context. Region, product category, customer name. The "by what" in every business question. |
| Measure | metrics: (with aggregation) |
Aggregated facts. SUM of revenue, AVG of order value, COUNT of customers. Measures answer "how much" across rows. |
| Business Metric | metrics: (derived) |
Higher-order calculations. Gross margin = (revenue - cost) / revenue. The numbers executives actually ask about. |
The semantic view is where you tell the agent what language the business speaks. When a VP asks "What was our net revenue retention last quarter?" the agent doesn't guess. It looks up the metric definition you wrote, finds the correct formula, identifies which dimensions apply, and generates the SQL.
What a Semantic View Actually Looks Like
Here's a simplified example. This is the kind of YAML that powers an agent's understanding of a sales domain:
name: sales_analytics
tables:
- name: orders
base_table: analytics.core.fct_orders
dimensions:
- name: region
synonyms: ["territory", "geo"]
description: "Sales region where the order originated"
expr: region_name
data_type: VARCHAR
- name: order_date
synonyms: ["date", "when"]
description: "Date the order was placed"
expr: order_date
data_type: DATE
facts:
- name: order_amount
description: "Total amount for the line item"
expr: line_total
data_type: NUMBER
- name: cost
description: "Cost of goods for the line item"
expr: cogs
data_type: NUMBER
metrics:
- name: total_revenue
synonyms: ["revenue", "sales"]
description: "Sum of all order amounts"
expr: SUM(order_amount)
data_type: NUMBER
- name: gross_margin
synonyms: ["margin", "profit margin"]
description: "Revenue minus cost, as a percentage of revenue"
expr: (SUM(order_amount) - SUM(cost)) / NULLIF(SUM(order_amount), 0)
data_type: NUMBER
Notice the synonyms field. This is the part that makes the semantic layer AI-ready. When a user says "revenue" instead of "total_revenue," the agent still finds the right metric. When they say "territory" instead of "region," the dimension still resolves. The YAML captures the way people actually talk about data — not just the way it's stored.
Facts Are the Foundation
A quick note on facts, because they're easy to overlook. In Snowflake's semantic view model, facts are typically private — they exist to support metrics but aren't directly queried by the agent. Think of them as building blocks. The order_amount fact becomes the total_revenue metric once you wrap it in a SUM. The cost fact becomes part of the gross_margin formula.
This mirrors Kimball's original design. Facts are at the grain. Metrics are the business-level view. The semantic layer is the translation between the two.
The Agent Layer: Instructions, Behavior, and Trust
The semantic layer tells the agent what the data means. The agent layer tells the agent how to use it.
In Snowflake Intelligence, an agent is an AI model connected to one or more semantic views, Cortex Search services, and tools. But what makes the agent trustworthy — what keeps it from hallucinating answers or overstepping its boundaries — is the instruction set you give it.
There are two types of instructions, and getting both right matters.
Orchestration Instructions
These control how the agent routes and handles questions. They live at the agent level and determine which tools the agent uses, when it generates charts versus tables, and what it refuses to do.
-- Agent-level instructions that shape behavior
"Whenever you can answer visually with a chart,
always choose to generate a chart even if the
user didn't specify."
"Use the search tool for all requests related to
refund policies. Do not calculate refund amounts
from the structured data."
"If the user asks about employee compensation,
respond that this data is not available through
this agent and suggest contacting HR."
The orchestration layer is where you encode the guardrails. This is the agent's judgment — not the LLM's default behavior, but the specific behavioral constraints you've decided on as a team. Which questions should route to structured data versus unstructured search? What topics are out of scope? When should the agent show a visualization instead of raw numbers?
Custom Instructions (Semantic Level)
These live inside the semantic model itself, in the module_custom_instructions field. They influence how the agent interprets questions and generates SQL for a specific domain.
module_custom_instructions:
- "When users ask about 'active customers', always
filter for customers with at least one order in
the last 90 days."
- "Revenue should always exclude refunds and
chargebacks unless the user explicitly asks for
gross revenue."
- "Default time window for any time-based question
is the current fiscal quarter unless otherwise
specified."
This is where institutional knowledge lives. The kind of knowledge that usually sits in someone's head — the senior analyst who knows that "active" means 90 days, or that "revenue" always means net. In a traditional BI environment, that knowledge gets lost when people leave. In a semantic model, it's codified.
description: "Sales region where the order originated", you're telling the agent how to reason about that column. Treat every description like you're briefing a new analyst on their first day. Be specific. Be precise. If "region" could mean geography or sales territory, say which one.
The Trust Stack
Snowflake's blog on the Agent Context Layer frames this well. The bottleneck for trustworthy agents isn't the model — it's the context. An enterprise agent needs a context stack that includes semantics (what the data means), identity (who's asking), constraints (what they're allowed to see), and provenance (where the answer came from).
Snowflake Intelligence handles identity through existing RBAC — the agent respects the same row-level and column-level security policies as any other Snowflake query. Semantics come from the semantic view. Constraints come from your instructions. Provenance comes from the generated SQL, which users can inspect.
When all four layers are in place, the agent doesn't just give answers. It gives answers you can audit.
The Three-Layer Architecture
Here's how the pieces stack. Every conversation between a business user and Snowflake Intelligence passes through all three layers:
A user asks: "What was gross margin by region last quarter?"
The agent receives the question (Layer 3), checks its instructions for any relevant constraints, then consults the semantic view (Layer 2) to find the gross_margin metric and region dimension. It generates SQL against the underlying tables (Layer 1), executes the query within the user's RBAC permissions, and returns the result — possibly as a chart, if the orchestration instructions say so.
Every layer depends on the one below it. And that's the point. The agent is only as good as the semantic model. The semantic model is only as good as the underlying data model.
Tips & Tricks from the Field
These are the lessons I've learned standing up Snowflake Intelligence in real environments. Some are obvious in hindsight. Most were learned the hard way.
Get the Data Model Right First
This is the single most important piece of advice. If your underlying tables are messy — inconsistent naming, ambiguous columns, no clear grain — the semantic layer can't save you. The agent will generate SQL against whatever you give it, and bad models produce bad answers. Invest the time in clean fact and dimension tables before you write a single line of YAML. This is the Kimball lesson that never goes away.
Start With One Domain, Not the Whole Warehouse
Snowflake's own guidance is to start with a single use case — Sales, or Support, or Finance — and get it working well before expanding. Begin with 3–5 tables and 10–20 columns per table. Exclude audit columns. Exclude anything that would confuse an analyst who's new to the domain. If it doesn't belong in a well-curated Excel export, it doesn't belong in your first semantic view.
Write Descriptions Like You're Briefing a New Analyst
Every column description in your semantic view is a prompt to the LLM. "Sales amount" is vague. "Gross sales amount in USD before discounts and refunds, at the individual order line level" is useful. The more specific and context-rich your descriptions are, the more accurately the agent interprets questions. Don't write descriptions for documentation. Write them for comprehension.
Use Synonyms Aggressively
Business users don't use the same vocabulary as your data model. "Territory" vs. "region." "Sales" vs. "revenue." "Clients" vs. "customers." The synonyms field in the semantic view is your opportunity to bridge that gap. Populate it generously. Pull the actual words from Slack conversations, meeting notes, executive emails — wherever the business language lives.
Build a Validation Set of 10–20 Verified Questions
Before you hand the agent to business users, build a set of questions you already know the answer to. "What was total revenue in Q3?" "How many active customers do we have in the Northeast?" Run them through the agent and verify the SQL it generates. This is your regression test. Every time you update the semantic view, run the set again. Accuracy is earned through iteration, not one-time configuration.
Use Custom Instructions for Business Logic, Not the Agent's Personality
I've seen teams spend time writing instructions like "Be friendly and concise." That's the agent's default behavior. What actually improves accuracy is encoding business logic: "Active customer means at least one order in 90 days." "Revenue always means net revenue." "Default time window is the current fiscal quarter." These are the instructions that prevent wrong answers.
Don't Forget Security
Snowflake Intelligence respects RBAC. But if your row-level security isn't configured, the agent can summarize data across all rows for any user who has access. Before you deploy, audit your security policies. Make sure the agent can only see what the person asking the question is allowed to see. This is table stakes for any enterprise deployment.
Split Semantic Views by Domain, Not by Similarity
If you have a Sales semantic view and a Marketing semantic view, make their descriptions clearly distinct. The agent uses these descriptions to route questions to the right view. When two views sound similar, routing accuracy drops and the agent may pick the wrong one. Clear boundaries, clear descriptions, clear domains.
The Old Work Is the New Work
There's a temptation, with every new wave of technology, to believe that the fundamentals have changed. That the new tool is so powerful it doesn't need the old discipline.
Snowflake Intelligence is a genuinely new capability. Letting a business user ask a question in plain English and get a governed, accurate answer from their data warehouse — that's a real shift in how analytics gets done.
But the reason it works is the oldest discipline in the data profession. Modeling. Naming things well. Defining what "revenue" actually means. Documenting the business logic that would otherwise live in one person's head. Building a structure that mirrors how the business actually thinks.
Kimball called it dimensional modeling. Snowflake calls it a semantic view. The names change. The work doesn't.
If you're about to stand up Snowflake Intelligence for your organization, start with the model. Get the facts right. Get the dimensions clean. Write descriptions that would make sense to someone who just joined the team yesterday. Then, and only then, point the agent at it.
The AI is the interface. The model is the intelligence.
References
- Overview of Semantic Views — Snowflake Documentation
- YAML Specification for Semantic Views — Snowflake Documentation
- Best Practices for Semantic Views — Snowflake Documentation
- The Agent Context Layer for Trustworthy Data Agents — Snowflake Blog
- Snowflake Intelligence: Talk to Your Data — Snowflake Blog
- Getting Started with Snowflake Intelligence — Snowflake Developer Guide
- Best Practices for Semantic Views for Cortex Analyst — Snowflake Guide
- Native Semantic Views: AI-Powered BI for the Enterprise — Snowflake Engineering Blog
- Open Semantic Interchange Initiative — Snowflake Blog
- Star Schema OLAP Cube — Kimball Group
- Kimball's Dimensional Data Modeling — Holistics Analytics Guidebook
- Snowflake Intelligence General Availability (Nov 2025) — Snowflake Release Notes