Conversation Analytics & Snowflake Intelligence: From Kimball to Cortex

The Setup

The Most Important Work Happens Before the AI Touches Anything

I spent 20 years in data before anyone used the word "agent" in a sentence about analytics. And in all that time, the lesson that survived every platform shift, every new tool, every vendor pitch is the same one Ralph Kimball was teaching in the 1990s.

Get the data model right. Everything else follows from that.

Snowflake Intelligence is the latest and most compelling expression of this idea. It is conversational analytics — an AI agent that sits on top of your Snowflake data, takes natural language questions from business users, and returns answers grounded in governed, well-modeled data. No SQL required. No dashboard to navigate. Just a question and an answer.

But the reason it works (when it works well) has nothing to do with the LLM. It has everything to do with what's underneath it: a semantic layer that tells the agent what the data means, and an instruction layer that tells the agent how to behave.

If you've done dimensional modeling, those two layers will feel familiar. Facts. Dimensions. Measures. Metrics. Business rules. Naming conventions. The same principles Kimball outlined in The Data Warehouse Toolkit are exactly what make a Snowflake Intelligence agent trustworthy.

This article walks through both layers — the semantic and the agent — and connects them back to the modeling discipline that makes them work. Along the way, I'll share the practical tips I've learned standing up these systems in real environments.

The Foundation

Kimball's Blueprint: Why Dimensional Modeling Still Matters

Ralph Kimball didn't invent the concepts of facts and dimensions from scratch. But he did something more important — he made them practical. His 1996 book The Data Warehouse Toolkit gave data teams a repeatable methodology for organizing data around business processes. Facts are the numeric measurements. Dimensions are the context around those measurements. A star schema keeps them clean. A snowflake schema normalizes the dimension hierarchies when the complexity warrants it.

The terminology sounds dated to some people now. It isn't. Every modern analytics tool — whether it calls them "measures" or "metrics" or "KPIs" — is working with the same underlying concepts.

Kimball's Core Insight The goal of dimensional modeling was never about database performance. It was about understandability. A well-modeled star schema mirrors how business users think about their data. When someone asks "total revenue by region last quarter," the star schema already has an answer pathway: a revenue fact, a region dimension, a time dimension. The model encodes the logic of the question before it's ever asked.

This is the part that connects directly to what Snowflake Intelligence is doing today. The reason an AI agent can answer a natural language question about your data is that someone — a data engineer, an analytics engineer, a modeler — already encoded what "revenue" means, what "region" refers to, and how time is structured in the model.

The agent doesn't figure this out on its own. It reads a semantic definition that you wrote. And that semantic definition is, in all the ways that matter, a dimensional model.

1996

Kimball publishes The Data Warehouse Toolkit. Star schemas, fact tables, dimension tables, conformed dimensions. The vocabulary that would define a generation of analytics.

2010s

The cloud data warehouse era begins. Snowflake, BigQuery, Redshift move data to elastic compute. The modeling discipline gets overshadowed by "just query the lake."

2020s

Semantic layers resurface — dbt metrics, Cube, AtScale. The industry remembers that raw data without meaning is just noise. Kimball's ideas re-enter the conversation under new names.

2025

Snowflake releases Semantic Views and Intelligence. The dimensional model becomes the interface between an AI agent and your data warehouse. Kimball's blueprint, realized in YAML.

Layer One

The Semantic Layer: Facts, Dimensions, Measures, and Metrics

In Snowflake Intelligence, the semantic layer is where you define what your data means in business terms. This is done through Semantic Views — YAML-based objects stored directly in your Snowflake schema that describe the entities in your data, the relationships between them, and the calculations that matter to the business.

If you've built a Kimball-style dimensional model, the mapping is almost one-to-one.

Kimball Concept	Snowflake Semantic View	What It Does
Fact	`facts:`	Numeric columns at the grain of the table. Revenue, cost, quantity. The raw building blocks of aggregation.
Dimension	`dimensions:`	Categorical attributes that provide context. Region, product category, customer name. The "by what" in every business question.
Measure	`metrics:` (with aggregation)	Aggregated facts. SUM of revenue, AVG of order value, COUNT of customers. Measures answer "how much" across rows.
Business Metric	`metrics:` (derived)	Higher-order calculations. Gross margin = (revenue - cost) / revenue. The numbers executives actually ask about.

The semantic view is where you tell the agent what language the business speaks. When a VP asks "What was our net revenue retention last quarter?" the agent doesn't guess. It looks up the metric definition you wrote, finds the correct formula, identifies which dimensions apply, and generates the SQL.

What a Semantic View Actually Looks Like

Here's a simplified example. This is the kind of YAML that powers an agent's understanding of a sales domain:

YAML — Semantic View Definition

name: sales_analytics
tables:
  - name: orders
    base_table: analytics.core.fct_orders
    dimensions:
      - name: region
        synonyms: ["territory", "geo"]
        description: "Sales region where the order originated"
        expr: region_name
        data_type: VARCHAR
      - name: order_date
        synonyms: ["date", "when"]
        description: "Date the order was placed"
        expr: order_date
        data_type: DATE
    facts:
      - name: order_amount
        description: "Total amount for the line item"
        expr: line_total
        data_type: NUMBER
      - name: cost
        description: "Cost of goods for the line item"
        expr: cogs
        data_type: NUMBER
    metrics:
      - name: total_revenue
        synonyms: ["revenue", "sales"]
        description: "Sum of all order amounts"
        expr: SUM(order_amount)
        data_type: NUMBER
      - name: gross_margin
        synonyms: ["margin", "profit margin"]
        description: "Revenue minus cost, as a percentage of revenue"
        expr: (SUM(order_amount) - SUM(cost)) / NULLIF(SUM(order_amount), 0)
        data_type: NUMBER

Notice the synonyms field. This is the part that makes the semantic layer AI-ready. When a user says "revenue" instead of "total_revenue," the agent still finds the right metric. When they say "territory" instead of "region," the dimension still resolves. The YAML captures the way people actually talk about data — not just the way it's stored.

Common Mistake Teams often try to put everything into one semantic view. Resist this. Snowflake recommends splitting by business domain — Sales, Marketing, Finance — not by similarity. When multiple semantic views have overlapping descriptions, the agent's routing accuracy drops. One clear domain per view. That's the discipline.

Facts Are the Foundation

A quick note on facts, because they're easy to overlook. In Snowflake's semantic view model, facts are typically private — they exist to support metrics but aren't directly queried by the agent. Think of them as building blocks. The order_amount fact becomes the total_revenue metric once you wrap it in a SUM. The cost fact becomes part of the gross_margin formula.

This mirrors Kimball's original design. Facts are at the grain. Metrics are the business-level view. The semantic layer is the translation between the two.

Layer Two

The Agent Layer: Instructions, Behavior, and Trust

The semantic layer tells the agent what the data means. The agent layer tells the agent how to use it.

In Snowflake Intelligence, an agent is an AI model connected to one or more semantic views, Cortex Search services, and tools. But what makes the agent trustworthy — what keeps it from hallucinating answers or overstepping its boundaries — is the instruction set you give it.

There are two types of instructions, and getting both right matters.

Orchestration Instructions

These control how the agent routes and handles questions. They live at the agent level and determine which tools the agent uses, when it generates charts versus tables, and what it refuses to do.

Example — Orchestration Instructions

-- Agent-level instructions that shape behavior

"Whenever you can answer visually with a chart,
 always choose to generate a chart even if the
 user didn't specify."

"Use the search tool for all requests related to
 refund policies. Do not calculate refund amounts
 from the structured data."

"If the user asks about employee compensation,
 respond that this data is not available through
 this agent and suggest contacting HR."

The orchestration layer is where you encode the guardrails. This is the agent's judgment — not the LLM's default behavior, but the specific behavioral constraints you've decided on as a team. Which questions should route to structured data versus unstructured search? What topics are out of scope? When should the agent show a visualization instead of raw numbers?

Custom Instructions (Semantic Level)

These live inside the semantic model itself, in the module_custom_instructions field. They influence how the agent interprets questions and generates SQL for a specific domain.

YAML — Custom Instructions Example

module_custom_instructions:
  - "When users ask about 'active customers', always
    filter for customers with at least one order in
    the last 90 days."
  - "Revenue should always exclude refunds and
    chargebacks unless the user explicitly asks for
    gross revenue."
  - "Default time window for any time-based question
    is the current fiscal quarter unless otherwise
    specified."

This is where institutional knowledge lives. The kind of knowledge that usually sits in someone's head — the senior analyst who knows that "active" means 90 days, or that "revenue" always means net. In a traditional BI environment, that knowledge gets lost when people leave. In a semantic model, it's codified.

The Real Unlock Column descriptions in the semantic model aren't just documentation. They are functional instructions for the LLM. When you write description: "Sales region where the order originated", you're telling the agent how to reason about that column. Treat every description like you're briefing a new analyst on their first day. Be specific. Be precise. If "region" could mean geography or sales territory, say which one.

The Trust Stack

Snowflake's blog on the Agent Context Layer frames this well. The bottleneck for trustworthy agents isn't the model — it's the context. An enterprise agent needs a context stack that includes semantics (what the data means), identity (who's asking), constraints (what they're allowed to see), and provenance (where the answer came from).

Snowflake Intelligence handles identity through existing RBAC — the agent respects the same row-level and column-level security policies as any other Snowflake query. Semantics come from the semantic view. Constraints come from your instructions. Provenance comes from the generated SQL, which users can inspect.

When all four layers are in place, the agent doesn't just give answers. It gives answers you can audit.

Putting It Together

The Three-Layer Architecture

Here's how the pieces stack. Every conversation between a business user and Snowflake Intelligence passes through all three layers:

Layer 3 — The Agent

Orchestration & Instructions

Routing, guardrails, behavioral constraints, chart preferences, scope boundaries

↓

Layer 2 — The Semantic Layer

Facts · Dimensions · Measures · Metrics

YAML definitions, synonyms, descriptions, custom instructions, business logic

↓

Layer 1 — The Data Layer

Snowflake Tables & Views

Dimensional models, fact tables, dimension tables, RBAC, row-level security

A user asks: "What was gross margin by region last quarter?"

The agent receives the question (Layer 3), checks its instructions for any relevant constraints, then consults the semantic view (Layer 2) to find the gross_margin metric and region dimension. It generates SQL against the underlying tables (Layer 1), executes the query within the user's RBAC permissions, and returns the result — possibly as a chart, if the orchestration instructions say so.

Every layer depends on the one below it. And that's the point. The agent is only as good as the semantic model. The semantic model is only as good as the underlying data model.

The Kimball Connection This three-layer architecture is Kimball's vision extended. He always argued that the data warehouse should be organized around business processes, with clear facts and dimensions that mirror how the business thinks. The semantic layer is the formalization of that principle. The agent layer is the interface that makes it accessible to everyone — not just the analysts who know SQL.

Practitioner Notes

Tips & Tricks from the Field

These are the lessons I've learned standing up Snowflake Intelligence in real environments. Some are obvious in hindsight. Most were learned the hard way.

1

Get the Data Model Right First

This is the single most important piece of advice. If your underlying tables are messy — inconsistent naming, ambiguous columns, no clear grain — the semantic layer can't save you. The agent will generate SQL against whatever you give it, and bad models produce bad answers. Invest the time in clean fact and dimension tables before you write a single line of YAML. This is the Kimball lesson that never goes away.

2

Start With One Domain, Not the Whole Warehouse

Snowflake's own guidance is to start with a single use case — Sales, or Support, or Finance — and get it working well before expanding. Begin with 3–5 tables and 10–20 columns per table. Exclude audit columns. Exclude anything that would confuse an analyst who's new to the domain. If it doesn't belong in a well-curated Excel export, it doesn't belong in your first semantic view.

3

Write Descriptions Like You're Briefing a New Analyst

Every column description in your semantic view is a prompt to the LLM. "Sales amount" is vague. "Gross sales amount in USD before discounts and refunds, at the individual order line level" is useful. The more specific and context-rich your descriptions are, the more accurately the agent interprets questions. Don't write descriptions for documentation. Write them for comprehension.

4

Use Synonyms Aggressively

Business users don't use the same vocabulary as your data model. "Territory" vs. "region." "Sales" vs. "revenue." "Clients" vs. "customers." The synonyms field in the semantic view is your opportunity to bridge that gap. Populate it generously. Pull the actual words from Slack conversations, meeting notes, executive emails — wherever the business language lives.

5

Build a Validation Set of 10–20 Verified Questions

Before you hand the agent to business users, build a set of questions you already know the answer to. "What was total revenue in Q3?" "How many active customers do we have in the Northeast?" Run them through the agent and verify the SQL it generates. This is your regression test. Every time you update the semantic view, run the set again. Accuracy is earned through iteration, not one-time configuration.

6

Use Custom Instructions for Business Logic, Not the Agent's Personality

I've seen teams spend time writing instructions like "Be friendly and concise." That's the agent's default behavior. What actually improves accuracy is encoding business logic: "Active customer means at least one order in 90 days." "Revenue always means net revenue." "Default time window is the current fiscal quarter." These are the instructions that prevent wrong answers.

7

Don't Forget Security

Snowflake Intelligence respects RBAC. But if your row-level security isn't configured, the agent can summarize data across all rows for any user who has access. Before you deploy, audit your security policies. Make sure the agent can only see what the person asking the question is allowed to see. This is table stakes for any enterprise deployment.

8

Split Semantic Views by Domain, Not by Similarity

If you have a Sales semantic view and a Marketing semantic view, make their descriptions clearly distinct. The agent uses these descriptions to route questions to the right view. When two views sound similar, routing accuracy drops and the agent may pick the wrong one. Clear boundaries, clear descriptions, clear domains.

A Note on Costs Snowflake Intelligence uses Cortex credits for every query the agent processes. In early deployments, watch your credit consumption closely. Set up resource monitors, review the query history for patterns of expensive queries, and consider auto-suspend settings on your warehouses. The agent is powerful, but it's not free. Govern it like any other compute workload.

Closing Reflection

The Old Work Is the New Work

There's a temptation, with every new wave of technology, to believe that the fundamentals have changed. That the new tool is so powerful it doesn't need the old discipline.

Snowflake Intelligence is a genuinely new capability. Letting a business user ask a question in plain English and get a governed, accurate answer from their data warehouse — that's a real shift in how analytics gets done.

But the reason it works is the oldest discipline in the data profession. Modeling. Naming things well. Defining what "revenue" actually means. Documenting the business logic that would otherwise live in one person's head. Building a structure that mirrors how the business actually thinks.

Kimball called it dimensional modeling. Snowflake calls it a semantic view. The names change. The work doesn't.

If you're about to stand up Snowflake Intelligence for your organization, start with the model. Get the facts right. Get the dimensions clean. Write descriptions that would make sense to someone who just joined the team yesterday. Then, and only then, point the agent at it.

The AI is the interface. The model is the intelligence.

Sources & Further Reading

The Most Important Work Happens Before the AI Touches Anything

Kimball's Blueprint: Why Dimensional Modeling Still Matters

The Semantic Layer: Facts, Dimensions, Measures, and Metrics

What a Semantic View Actually Looks Like

Facts Are the Foundation

The Agent Layer: Instructions, Behavior, and Trust

Orchestration Instructions

Custom Instructions (Semantic Level)

The Trust Stack

The Three-Layer Architecture

Tips & Tricks from the Field

Get the Data Model Right First

Start With One Domain, Not the Whole Warehouse

Write Descriptions Like You're Briefing a New Analyst

Use Synonyms Aggressively

Build a Validation Set of 10–20 Verified Questions

Use Custom Instructions for Business Logic, Not the Agent's Personality

Don't Forget Security

Split Semantic Views by Domain, Not by Similarity

The Old Work Is the New Work

References