The Open Semantic Interchange Initiative: A Standard for the AI Data Layer

Context

The Problem Has a Name: Semantic Drift

Ask ten analysts at a large enterprise what "monthly active users" means, and you will likely get ten different answers. Not because they are careless, but because each tool they work in — the BI platform, the data warehouse, the marketing cloud, the CRM — defines it slightly differently. One system counts logins. Another counts session initiations. A third filters out internal users. A fourth applies a 28-day rolling window instead of a calendar month. The metric carries the same name across the organization but arrives with different DNA in every context.

This phenomenon — metrics and business logic silently diverging as they travel across tools and teams — has a name: semantic drift. It has existed as long as organizations have had more than one data system. What changed in 2024 and 2025 is who got hurt by it.

For decades, semantic drift was a human problem. Analysts would reconcile discrepancies in spreadsheets, add footnotes to dashboards, and hold "data alignment" meetings to agree on which number was the "real" number. Frustrating, but manageable. The humans understood the context behind each definition even when the systems did not.

AI agents do not have that luxury. When a conversational analytics system or an agentic AI receives a natural language question — "what is our MoM revenue growth in the northeast region?" — it must resolve that question into precise computation against a specific data source using a specific metric definition. It cannot call a meeting. It cannot read the footnote. It either has a canonical, machine-readable definition of "revenue" and "northeast region" to reference, or it guesses. And when it guesses, it hallucinates — returning a confident, plausible-looking number that means something subtly different than what the person asked for.

The Core Tension AI agents are becoming primary consumers of enterprise data, but enterprise data was never structured with machines as the audience. Semantic context — what a field means, how a metric is calculated, how business terms map to technical identifiers — was embedded in human documentation, tribal knowledge, and organizational convention. None of that transfers automatically to a large language model querying a data warehouse.

The data industry recognized this problem and attempted to solve it, repeatedly, through semantic layers. Tools like LookML (Looker), MetricFlow (dbt), Cube, AtScale, and others gave organizations a way to define business metrics centrally and serve them consistently. These tools worked — inside their own ecosystems. The problem was portability. A metric defined in LookML lives in LookML. A metric defined in MetricFlow lives in dbt. If an organization used both tools, or wanted to feed the same definition into a different AI platform, it had to redefine the metric from scratch. Every new tool in the stack meant another surface where drift could occur.

That is the gap the Open Semantic Interchange Initiative (OSI) was created to close.

Initiative Overview

What Is the Open Semantic Interchange?

The Open Semantic Interchange is an industry coalition and open specification designed to create a universal interchange format for semantic metadata — a common language that analytics tools, AI platforms, BI systems, and data catalogs can all read, write, and exchange without translation loss.

Snowflake announced the initiative on September 23, 2025, at its annual Snowflake Summit. The framing was deliberately broad: this was not a Snowflake product, a Snowflake format, or a Snowflake-controlled standard. It was positioned as an open, vendor-neutral effort led by a coalition of companies who collectively recognized that no single vendor could solve the interoperability problem on their own — and that trying to do so would simply reproduce the fragmentation in a new form.

The founding coalition was notable for its range. It included not just Snowflake, but Salesforce (and by extension Tableau and Einstein Analytics), dbt Labs (the de facto backbone of the modern analytics engineering stack), BlackRock (representing large-scale institutional data consumers), RelationalAI, ThoughtSpot, Sigma, Omni, Alation, Atlan, Cube, Hex, Honeydew, Mistral AI, and others. The inclusion of Mistral AI was a deliberate signal: this specification was being designed from the ground up to be AI-legible, not just BI-legible.

30+

Working Group Members

v1.0

Spec Released Jan 2026

Apache 2.0

Open Source License

The Core Proposition

OSI's proposition is simple to state and genuinely hard to execute: define a vendor-neutral, machine-readable format for representing semantic layer constructs — datasets, metrics, dimensions, relationships, and contextual metadata — such that any compliant tool can ingest a model defined in any other compliant tool without redefining anything.

Think of it as the USB-C of semantic layers. The connector is standardized. The devices on either end can be anything. A metric defined in dbt's MetricFlow and exported as OSI-formatted YAML should be importable into Snowflake's Cortex Analyst, or ThoughtSpot Spotter, or a custom agentic AI application, with no semantic loss. The business logic travels with the data definition, not just the data schema.

The specification lives on GitHub at github.com/open-semantic-interchange/OSI under an Apache 2.0 license. A dedicated project website launched at open-semantic-interchange.org in January 2026.

Founding and Expanded Coalition

The working group has grown substantially since the September 2025 announcement. The founding members included:

Snowflake Salesforce / Tableau dbt Labs BlackRock RelationalAI ThoughtSpot Sigma Omni Mistral AI Hex Cube Honeydew Alation Atlan Blue Yonder Select Star Elementum AI

Since the January 2026 spec release, the working group has expanded to include AtScale, Coalesce, Collate, Credible, Databricks, JetBrains, Lightdash, Qlik, Collibra, DataHub, Domo, Firebolt, Informatica, Instacart, Preset, and Starburst. The addition of Databricks is particularly significant — it brings both the Databricks Lakehouse platform and Unity Catalog into the OSI orbit, meaning the two dominant cloud data platform ecosystems (Snowflake and Databricks) are now both participating in the same semantic standard.

Technical Deep Dive

Inside the Specification

The OSI specification uses a declarative YAML format. Its design lineage is visible: it draws heavily on dbt Labs' MetricFlow framework, which itself inherited ideas from LookML and earlier semantic layer traditions. But OSI adds two critical layers that prior formats lacked — dialect-aware expression support and first-class AI context.

Top-Level Structure

An OSI document has two top-level keys: a version identifier and a semantic_model array. Each entry in the array is a named semantic model — analogous to a data domain or subject area — that contains datasets, metrics, relationships, and contextual metadata. A single OSI file can define multiple models, though in practice most organizations will maintain one model per domain.

OSI YAML — Top-Level Structure

version: "0.1.1"

semantic_model:
  - name: retail_analytics_model
    description: Core semantic model for retail sales and customer analytics
    ai_context:
      instructions: "Use this model for retail analytics. It provides
        comprehensive sales, customer, product, and store data. Supports
        time-based analysis, customer segmentation, and store performance."

    datasets: [...]
    metrics: [...]
    relationships: [...]
    custom_extensions: [...]

Datasets: The Logical Business Entity Layer

Datasets in OSI represent logical business entities — typically corresponding to fact tables and dimension tables in a star schema, though the abstraction is not limited to that physical shape. Each dataset declares a source (the physical table or view), a primary key, and an array of fields. Critically, each field carries an ai_context block that provides synonyms — alternative natural language terms that map to this technical field name.

OSI YAML — Dataset with AI Context

datasets:
  - name: store_sales
    source: tpcds.public.store_sales
    primary_key: [ss_item_sk, ss_ticket_number]
    description: Fact table containing all store sales transactions
    ai_context:
      synonyms:
        - "sales transactions"
        - "store purchases"
        - "retail sales"
        - "POS data"

    fields:
      - name: ss_ext_sales_price
        expression:
          dialects:
            - dialect: ANSI_SQL
              expression: ss_ext_sales_price
        description: Extended sales price (quantity x unit price)
        ai_context:
          synonyms:
            - "total price"
            - "line total"
            - "revenue"

The ai_context.synonyms array is where OSI does something genuinely new. Prior semantic layer formats were written for BI tools — systems where a user navigated a data model visually and selected fields by their technical names or curated display labels. AI agents work differently: they receive a natural language question and must resolve business terms — "revenue," "units sold," "store purchases" — to technical field identifiers. Without a structured synonym registry, that resolution happens probabilistically inside the LLM, which means it can be wrong. OSI externalizes that mapping into the data model itself, making it deterministic and auditable.

Metrics: Dialect-Aware Computation

Metrics in OSI are defined as named, reusable calculations. Each metric carries an expression block that supports multiple SQL dialects — ANSI SQL, Snowflake SQL, BigQuery Standard SQL, Spark SQL, and others can all be represented within a single metric definition. This is OSI's answer to the cross-platform execution problem: you do not have to choose a target platform at definition time.

OSI YAML — Metric with Multi-Dialect Expressions

metrics:
  - name: customer_lifetime_value
    expression:
      dialects:
        - dialect: ANSI_SQL
          expression: >
            SUM(store_sales.ss_ext_sales_price)
            / COUNT(DISTINCT customer.c_customer_sk)
    description: Average lifetime sales value per unique customer
    ai_context:
      synonyms:
        - "CLV"
        - "LTV"
        - "customer value"
        - "lifetime revenue"
        - "average customer worth"

  - name: store_productivity
    expression:
      dialects:
        - dialect: ANSI_SQL
          expression: >
            SUM(store_sales.ss_ext_sales_price)
            / NULLIF(SUM(store.s_number_employees), 0)
    description: Sales revenue per employee across stores
    ai_context:
      synonyms:
        - "sales per employee"
        - "employee productivity"
        - "revenue per headcount"

Notice that customer_lifetime_value is defined as a ratio — a sum divided by a distinct count — with its own ai_context.synonyms. When an AI agent encounters a user asking "what is our CLV by region?", it can resolve "CLV" to this metric definition, retrieve the precise SQL expression for the target platform's dialect, and execute it correctly without interpretation. The business logic travels with the question.

Custom Extensions: Vendor-Specific Metadata

One of OSI's design concessions to reality is the custom_extensions block. Rather than demanding that every possible vendor capability be encoded in the core spec, OSI provides a structured escape hatch: any vendor can attach arbitrary JSON metadata to a semantic model under their own namespace, and compliant tools that do not understand that namespace simply ignore it.

OSI YAML — Custom Extensions (Salesforce + dbt)

custom_extensions:
  - vendor_name: SALESFORCE
    data: |
      {
        "tableau_workbook_id": "retail_dashboard",
        "einstein_enabled": true,
        "crm_sync": {
          "enabled": true,
          "sync_frequency": "daily",
          "customer_mapping": "customer.c_customer_id -> Account.AccountNumber"
        }
      }

  - vendor_name: DBT
    data: '{"project_name": "retail_analytics", "models_path": "models/semantic"}'

This design choice reflects a pragmatic understanding of how standards succeed. The core spec needs to be stable and minimal — covering only what every participant agrees belongs in the shared layer. Vendor-specific capabilities, integrations, and enrichments go in the extensions block. It is the same architectural decision that made HTTP extensible through headers and HTML extensible through custom attributes: reserve the core for consensus, allow the periphery to evolve.

Relationships: The Join Layer

OSI represents join relationships between datasets explicitly, specifying foreign key mappings, join type (left, inner, full outer), and cardinality (one-to-many, many-to-one, many-to-many). This is the information that a query engine needs to correctly traverse a star schema — or any other physical layout — at runtime. By encoding it in the semantic model rather than the query layer, OSI ensures that joining logic is defined once and applied consistently whether the consuming system is a traditional BI tool, a NL2SQL pipeline, or an agentic AI workflow.

Design Philosophy The OSI specification was deliberately kept narrow at v1.0. It covers the constructs that every analytics tool must understand: datasets, metrics, dimensions, relationships, and contextual metadata. Row-level security, materialization strategies, incremental load patterns, and other operational concerns are outside scope — not because they do not matter, but because those concerns fragment vendors more than they unify them. Consensus requires restraint.

Architecture

The Interoperability Stack

OSI does not replace existing semantic layers — it connects them. Understanding where it fits requires a clear picture of the tools it is designed to interoperate with.

dbt Semantic Layer + MetricFlow

dbt Labs introduced MetricFlow as the query engine behind the dbt Semantic Layer, providing a structured way to define metrics in YAML that could be executed consistently across data warehouses. MetricFlow was already the closest thing to a de facto standard for analytics engineering teams. OSI's relationship with MetricFlow is additive: dbt has committed to using the OSI spec as the interchange format for MetricFlow definitions, meaning metrics authored in dbt can be exported as OSI-compliant YAML and consumed by any other OSI-compatible tool. The workflow is: author in dbt, export as OSI, import anywhere.

Snowflake Cortex Analyst

Snowflake's Cortex Analyst is a native NL2SQL capability built into the Snowflake platform, allowing users to ask natural language questions against Snowflake data. As the OSI lead sponsor, Snowflake has built Cortex Analyst around the OSI format — semantic models defined in OSI YAML are the configuration layer that tells Cortex Analyst how to interpret user questions, map natural language terms to fields, and generate accurate SQL. OSI is not incidental to Cortex Analyst; it is the semantic substrate the system runs on.

ThoughtSpot Spotter

ThoughtSpot joined OSI as a founding member and has committed to OSI compatibility in its Spotter conversational analytics product. For ThoughtSpot, OSI represents a path to consume semantic context defined elsewhere in an organization's stack — rather than requiring all semantic definitions to live in ThoughtSpot's own modeling layer, organizations can maintain a single OSI-formatted model and have Spotter read from it directly.

Salesforce / Tableau Einstein

Salesforce's participation covers both Tableau (its enterprise BI platform) and Einstein (its AI analytics layer). The custom_extensions example in the official OSI sample file references Tableau workbook IDs and Einstein enablement flags explicitly, reflecting how Salesforce intends to use the extension mechanism: core metric definitions live in the shared OSI layer, Tableau/Einstein-specific configuration lives in the extension block, and the two pieces travel together.

The Converter Layer

The OSI GitHub repository includes a converters directory — a growing collection of tools for translating existing semantic layer definitions into OSI-compliant YAML. Converters for LookML (Looker/Google) and MetricFlow are in active development. This converter layer is essential for adoption: organizations with existing semantic layer investments cannot start from scratch, and OSI's interoperability value only materializes if existing definitions can be migrated into the shared format without manual reauthoring.

Important Caveat Converter fidelity is imperfect. LookML, MetricFlow, and other formats encode vendor-specific capabilities that have no direct OSI equivalent. Custom derived dimensions, complex measure filters, and PDT (persistent derived table) logic in LookML, for example, may not translate cleanly into the OSI core spec. The converters produce valid OSI output, but some complexity may need to live in the custom_extensions block or be simplified. Organizations should validate converter output against expected query behavior before treating it as production-ready.

Impact

What OSI Means for Conversational Analytics

Conversational analytics — the ability for users to ask natural language questions and receive trustworthy, computed answers from enterprise data — has been technically feasible for several years. The constraint was never the language model's ability to generate SQL. Models like GPT-4 and Claude were generating syntactically correct SQL from natural language prompts as early as 2023. The constraint was the accuracy of that SQL given the semantic complexity of enterprise data models.

A language model asked to answer "what is our revenue this quarter compared to last quarter?" will generate SQL that looks correct. Whether it is actually correct depends entirely on how "revenue" is defined in the specific data environment the query executes against. If "revenue" should exclude returns, or should be net of discounts, or should use recognition date rather than order date, or should exclude a specific business unit that is in the middle of an acquisition — none of that information is in the schema. It exists in the semantic layer, or in human heads, or nowhere.

This is precisely the gap OSI addresses for conversational analytics workflows.

How OSI Changes the NL2SQL Pipeline

A traditional NL2SQL pipeline works roughly as follows: receive a natural language question, construct a prompt that includes database schema information, send the prompt to a language model, receive SQL output, execute it. The schema information gives the model column names and data types, but not business logic. The model must infer metric definitions from context clues in column names and table names — which is exactly where hallucination occurs.

An OSI-augmented pipeline changes the input layer fundamentally. Instead of raw schema, the model receives an OSI semantic model: structured definitions of what each metric means, how it is calculated, what natural language terms map to which technical fields, and what contextual instructions apply to this domain. The question "what is our CLV by region?" no longer requires the model to infer what CLV means — the OSI model has already defined it, mapped its synonyms, and provided the SQL expression that computes it correctly for the target platform dialect.

The Accuracy Implication In internal benchmarking by OSI member organizations, NL2SQL accuracy on complex business questions improves measurably when the query pipeline is grounded in an OSI semantic model versus raw schema. The primary reason is not the language model's capability — it is the quality of the context provided. OSI's ai_context blocks are, in effect, structured retrieval augmentation for business logic.

Agentic AI Workflows

The implications extend beyond single-turn question answering. Agentic AI systems — multi-step workflows where an AI agent plans, executes, observes, and iterates — increasingly need to query enterprise data as part of complex reasoning chains. A procurement agent that identifies cost reduction opportunities, a financial agent that monitors for margin compression, a clinical operations agent that tracks patient throughput — all of these workflows require the agent to issue data queries against enterprise systems with precision.

Without semantic grounding, each of those agents must either embed metric definitions in its system prompt (fragile, duplicative, version-uncontrolled) or risk semantic drift in its results. With an OSI-formatted semantic model available at query time, the agent can retrieve the canonical definition of any metric it needs, issue a correctly-formed query, and trust that the result means what it expects.

The model-level ai_context.instructions field is particularly relevant here. It allows model authors to provide guidance specifically to AI agents consuming the model — context that a human BI user would not need but an autonomous agent does, such as which metrics are appropriate for which types of analysis, how to interpret null values in certain fields, or what business rules apply to edge cases. This is semantic documentation designed for machine consumption from the ground up.

The Single Source of Truth Problem

One of the persistent challenges in enterprise analytics is maintaining a single source of truth for metric definitions as organizations grow and their tool stacks proliferate. A business intelligence team defines "active user" in Looker. A data science team defines it differently in a Python notebook. A product team embeds a third definition in an event tracking system. An executive dashboard pulls from a fourth source. OSI does not eliminate this problem by itself — organizations still have to choose a canonical definition. But it gives them a format in which to encode that canonical definition once and have it flow, consistently, into every tool that participates in the OSI ecosystem. The single source of truth becomes portable in a way it has never been before.

Status — April 2026

Where Things Stand Now

Seven months have passed since the September 2025 announcement. The project has moved through a predictable but meaningful arc: from vision to specification to early implementation, with governance still evolving.

September 23, 2025

Public Announcement at Snowflake Summit

Snowflake announces the OSI initiative with 17 founding partners. The framing is deliberately open and vendor-neutral. A GitHub repository is created but the formal specification is not yet published.

November 2025

First Working Group Meeting

The OSI working group holds its inaugural meeting. The group expands with additional members including catalog vendors and data governance platforms. Active discussion begins on the core spec, governance model, and extension framework.

January 27–28, 2026

v1.0 Specification Released

The first version of the OSI specification publishes on GitHub under Apache 2.0. Simultaneously, new working group members are announced including Databricks, AtScale, Qlik, and Lightdash. The dedicated project website launches at open-semantic-interchange.org. Snowflake, dbt Labs, and Salesforce publish companion blog posts detailing their implementation commitments.

Q2 2026 (Current Phase)

Early Adoption and Converter Development

Platform teams at member organizations are integrating OSI support into their products. The converter library is growing. The community governance model is being designed, with plans to transition OSI to a neutral foundation-led structure. The Phase 2 roadmap targets native OSI support in 50+ platforms by end of 2026.

What the Databricks Addition Signals

Databricks joining the OSI working group in January 2026 deserves specific attention. Snowflake and Databricks have competed intensely for the enterprise data platform market for several years — they are each other's most direct competitor in the cloud lakehouse and data warehouse space. For Databricks to participate in an initiative led by Snowflake is a meaningful signal. It suggests that the participants believe the interoperability problem is large enough, and the business case for solving it strong enough, that competitive considerations are secondary.

It also materially changes OSI's coverage. Databricks brings Unity Catalog — its unified data governance and metadata layer — into the conversation. Unity Catalog already manages semantic metadata for Databricks workloads. OSI compatibility with Unity Catalog would mean that semantics defined in Databricks' ecosystem can participate in the shared interchange format, dramatically expanding the pool of organizations for whom OSI is a viable path.

Governance: The Unresolved Question

The most significant structural question OSI has not yet resolved is governance. The project has operated under Snowflake's informal leadership since the September 2025 announcement. Snowflake has stated explicitly that it intends to transition OSI to a "neutral, foundation-led governance model" — the same pattern that successful open standards like OpenAPI and CNCF-hosted projects have followed. What that foundation will be, when the transition will occur, and how voting rights and spec modification rights will be structured remain open questions as of April 2026.

This matters for adoption. Organizations evaluating whether to build their semantic layer tooling around OSI are asking whether the spec will remain stable, whether a single vendor can unilaterally change it, and whether their interests will be represented in future versions. A foundation structure answers those questions. Until it exists, some organizations will watch rather than commit.

The Comparison to Other Standards Efforts

It is worth situating OSI in the broader history of data interoperability efforts, most of which did not succeed. PMML, the Predictive Model Markup Language, tried to standardize model exchange across statistical tools in 1999 and achieved limited adoption. MDX, Microsoft's multidimensional query language, became a de facto standard but only within the OLAP cube ecosystem. The Semantic Web's RDF and OWL specifications produced a rich ontology framework that never achieved mainstream enterprise data adoption.

OSI's structural advantages over prior efforts are real. It has buy-in from organizations that represent the actual working stack of modern analytics engineering — dbt Labs, Snowflake, Databricks, Salesforce/Tableau — not just standards bodies and academic contributors. The YAML format is developer-friendly and already familiar to dbt users. The Apache 2.0 license removes commercial friction. And the problem it is solving is acutely felt right now, not hypothetically in the future.

The risks are also real. The working group spans competitors with different incentives. Vendor-specific extensions could dilute the core's interoperability value if they become the primary vehicle for differentiation. And the governance gap creates uncertainty that may slow adoption at exactly the moment when momentum matters.

Net Assessment OSI is the most credible attempt at semantic interoperability the data industry has produced, arriving at the moment when the need is highest. The v1.0 spec is technically sound and the coalition is genuinely broad. The critical next milestones — native implementations shipping in member products, the governance foundation being established, and the converter library maturing — will determine whether OSI becomes the connective tissue of the AI data stack or another well-intentioned standard that arrived without enough critical mass to stick.

What to Watch in the Next 12 Months

For practitioners tracking OSI's progress, three developments will be most telling. First, whether native OSI import/export ships in dbt Cloud and Snowflake Cortex Analyst — these are the two products with the largest install bases in the working group, and their implementations will create the first real-world validation of the spec under production workloads. Second, whether Databricks ships OSI support in Unity Catalog — that would effectively mean every major cloud data platform supports the format. Third, whether the governance foundation is announced. If Snowflake is serious about OSI being a community standard rather than a Snowflake standard, the foundation transition will happen. If it does not happen by mid-2026, the organizations currently watching from the sidelines will take note.

The specification itself is a starting point, not an endpoint. The OSI team has been explicit about this. Version 0.1.1 — the current specification version — is intentionally lean. Future versions will address row-level security context, materialization hints, time-series grain specifications, and richer AI instruction formats as real-world implementation surfaces edge cases. That evolution should be expected and welcomed. The value of a shared standard is not that it is perfect at v1.0; it is that it exists, and that the industry agrees to make it better together.

References

The Open Semantic Interchange Initiative: Building the Missing Link in the AI Data Stack

The Problem Has a Name: Semantic Drift

What Is the Open Semantic Interchange?

The Core Proposition

Founding and Expanded Coalition

Inside the Specification

Top-Level Structure

Datasets: The Logical Business Entity Layer

Metrics: Dialect-Aware Computation

Custom Extensions: Vendor-Specific Metadata

Relationships: The Join Layer

The Interoperability Stack

dbt Semantic Layer + MetricFlow

Snowflake Cortex Analyst

ThoughtSpot Spotter

Salesforce / Tableau Einstein

The Converter Layer

What OSI Means for Conversational Analytics

How OSI Changes the NL2SQL Pipeline

Agentic AI Workflows

The Single Source of Truth Problem

Where Things Stand Now

What the Databricks Addition Signals

Governance: The Unresolved Question

The Comparison to Other Standards Efforts

What to Watch in the Next 12 Months

Sources

Snowflake Unites Industry Leaders to Unlock AI's Potential with the Open Semantic Interchange Initiative

Open Semantic Interchange (OSI) Specification Finalized

What the Open Semantic Interchange (OSI) Spec Means for Metrics, Semantics, and AI

Ending Semantic Drift: The First Unified Business Logic Foundation for AI and BI

OSI Specification Repository

Open Semantic Interchange (OSI) Further Expands Partner Ecosystem and Holds First Working Group Meeting

Snowflake-Led Coalition Targets Data Fragmentation with Vendor-Neutral Semantic Standard

Open Semantic Interchange — Project Website