Platform Manager & Engineer Brief
Platform Manager Brief
- Snowflake Intelligence is positioning as the "control plane for the agentic enterprise" — MCP integrations for Gmail, Salesforce, Slack, and Jira are in preview. The pitch CIOs need to hear.
- Cortex AI Guardrails hit GA — runtime protection against prompt injection and jailbreaks is now production-ready. Safe LLM deployment just got a serious upgrade.
- Snowflake Postgres is GA with 20x faster disk I/O vs. managed Postgres alternatives. Enterprises can now consolidate OLTP + OLAP + AI on one platform.
- Iceberg V3 support is in the pipeline — semi-structured data, row-level CDC, geospatial, and nanosecond timestamps are coming. Open format momentum is accelerating.
- Snowflake Adaptive Compute entered private preview — automatic warehouse sizing and sharing, promising lower costs with no manual tuning.
- Summit 26 is June 1–4 in San Francisco (20,000+ attendees expected). Anthropic's Daniela Amodei keynotes. Start registering your team now.
Engineer Brief
- Dynamic tables now support PRIMARY KEY RELY for incremental refresh — even when upstream tables do full refresh. Massive cost savings on large pipeline refreshes with a 1-line DDL change.
- Dynamic Iceberg tables now support PARTITION BY, TARGET_FILE_SIZE, and PATH_LAYOUT (GA, April 13). Critical for controlling file layout at scale.
- Snowflake Storage for Apache Iceberg tables is in public preview on AWS and Azure — Snowflake-managed infrastructure that appears as standard Iceberg to external Spark/Trino clusters.
- arctic-extract fine-tuning is in Preview — submit domain-specific training data to make AI_EXTRACT dramatically more accurate on proprietary document formats.
- Cortex Code now supports external catalogs: AWS Glue, Databricks Unity Catalog, and Postgres. The data-native AI development loop extends beyond Snowflake storage.
- Cortex AI Guardrails (GA) adds AI_FUNCTIONS_USER role for fine-grained access control — decouple Cortex access from AI function usage in your RBAC model.
- Openflow Oracle CDC connector is GA (Feb 27) — agentless change data capture from Oracle to Snowflake, no custom scripts required.
Platform Updates & Releases
Snowflake Expands Intelligence & Cortex Code to Power the Agentic Enterprise Control Plane
Snowflake Intelligence gains Deep Research (public preview, multi-step agentic reasoning over structured and unstructured data with citations), a mobile app in public preview, and MCP-based Skills connectors for Gmail, Google Calendar, Jira, Salesforce, Slack, and Google Docs. Cortex Code now supports external data systems including AWS Glue, Databricks Unity Catalog, and Postgres — enabling data-native AI development across hybrid environments. Cortex Code Agent SDK is available for Python and TypeScript. Plan Mode allows users to preview and approve agentic workflows before execution. Enterprises including Capita, Logitech, and United Rentals are cited as production adopters.
Cortex AI Guardrails Hit General Availability — Runtime LLM Protection Now Production-Ready
Cortex AI Guardrails are part of Snowflake Horizon Catalog and run as a transparent layer on Cortex Code, screening for prompt injection and jailbreak attempts before queries execute. The system is powered by Llama Guard 3, which evaluates both inputs and outputs against configurable harm categories including violence, hate speech, and data exfiltration attempts. A new AI_FUNCTIONS_USER database role lets account admins decouple general Cortex access from specific high-cost or high-risk AI function permissions — enabling a least-privilege model for your AI governance framework. Responses are evaluated pre-delivery with near-zero added latency in benchmarked tests.
Dynamic Tables Get PRIMARY KEY Support for Incremental Refresh — Ending Full-Recompute Waste
When a dynamic table or upstream base table carries a PRIMARY KEY with the RELY property, Snowflake uses that key for row-level change tracking instead of requiring change-tracking columns. Set REFRESH_MODE = INCREMENTAL on downstream tables to activate this path — Snowflake will then compute only the delta rows that changed. The feature supports GROUP BY-based deduplication and QUALIFY ROW_NUMBER() = 1 filters as system-derived primary keys. Dynamic Iceberg tables also benefit from this capability. GA as of April 16, 2026.
Snowflake Postgres Is Generally Available — Native OLTP + OLAP + AI on One Platform
Powered by the pg_lake extension set, Snowflake Postgres allows Postgres to directly query, manage, and write to Apache Iceberg tables using standard SQL — bridging OLTP and the open lakehouse. Disk I/O performance benchmarks show 20x improvement over traditional managed Postgres solutions. The implementation is 100% community Postgres (not a proprietary fork), enabling use of standard drivers, ORMs, and tooling. Available in selected AWS and Azure regions. Pairs with Snowflake's governance, RBAC, and data sharing framework out of the box.
Openflow Oracle CDC Connector Goes GA — Agentless Change Data Capture from Oracle to Snowflake
Openflow is built on Apache NiFi and operates as a Snowflake-managed control plane with a BYOC (Bring Your Own Cloud) data plane deployed in your VPC. The Oracle connector supports structured CDC from Oracle OLTP sources, handling schema changes and high-throughput transactional loads. The unified Openflow service integrates structured and unstructured, batch and streaming data movement into a single managed platform, replacing the need for external ETL tools for Oracle migration patterns. Available on AWS Marketplace via Snowflake's Openflow Implementation partners.
Cortex AI & ML
SiliconAngle: Inside Snowflake's Bet on Becoming the AI Ops Layer for the Enterprise
The article covers how Cortex Code now expands beyond Snowflake-managed data to include Databricks, AWS Glue, and Postgres as external catalog sources. Snowflake's Agent SDK (Python and TypeScript) allows teams to embed Cortex Code functions in external applications. Cloud Agents are in private preview inside Snowsight. Plan Mode adds a human-in-the-loop approval step for multi-step agentic workflows. More than 9,100 customers use Snowflake AI products weekly; over 50% of the customer base uses Snowflake Intelligence or Cortex Code since launch six months ago.
Fine-Tuning arctic-extract Models for Domain-Specific Document Extraction (Preview)
Fine-tuning jobs are launched via the CORTEX.FINETUNE() function using Snowflake Dataset objects as training input. The number of training epochs is automatically determined by the system based on dataset size. Fine-tuned models are then deployed as custom model variants and called via the AI_EXTRACT function. The ACCOUNTADMIN role must grant SNOWFLAKE.CORTEX_USER to users calling FINETUNE(). The feature is particularly valuable for enterprises with structured proprietary documents — insurance forms, medical records, legal contracts — where zero-shot models have high error rates. Still in Preview as of January 2026.
Snowflake Cortex AI: The Complete 2026 Guide for Platform Teams — DataEngineer Hub
The guide covers the complete Cortex SQL function catalog: COMPLETE(), CLASSIFY_TEXT(), EXTRACT_ANSWER(), TRANSLATE(), SENTIMENT(), SUMMARIZE(), PARSE_DOCUMENT(), and AI_EXTRACT(). It details supported model options (including Arctic, Llama, Mistral, and Reka families), token limits, and credit consumption rates per function. Vector search integration via VECTOR_COSINE_SIMILARITY() and Cortex Search Service are covered with practical examples. The guide includes a cost modeling section with per-function credit benchmarks and a governance checklist aligned to Snowflake Horizon.
Gen AI in Action: Real Cortex AI Customer Stories and Measurable Outcomes — Snowflake Blog
Highlighted case studies include: Sigma Computing using Cortex AI to increase sales efficiency and accelerate sales cycles; Terakeet identifying new market opportunities 98% faster using Cortex-powered analysis; Siemens Energy building a Cortex AI chatbot that surfaces and summarizes 500,000+ pages of internal technical documents; and Alberta Health Services using Cortex AI for physician note automation within Snowflake's HIPAA-compliant environment. All deployments leverage the shared governance and security posture of the AI Data Cloud.
Architecture & Engineering
Snowflake Storage for Apache Iceberg Tables Enters Public Preview on AWS and Azure
With Snowflake Storage for Iceberg, tables are hosted on Snowflake-managed infrastructure rather than customer-managed S3/ADLS buckets. External engines (Spark, Trino) see a standard, high-performance Iceberg REST catalog endpoint — zero proprietary client required. Snowflake handles compaction, metadata management, and zero-copy cloning automatically. This decouples storage management from compute diversity: teams can read from Spark and write from Snowflake SQL using the same physical data. Pairs with Snowflake's governance layer for unified access control across all engines.
Snowflake Commits to Iceberg V3 Support — Semi-Structured Data, Geospatial, and Row-Level CDC Coming
Iceberg V3 adds native support for semi-structured data (JSON/VARIANT equivalent), row-level change data capture (enabling downstream streaming consumers), geospatial data types, and nanosecond-precision timestamps. Snowflake's governance portability plan includes extending Snowflake Horizon access policies and data masking to Iceberg V3 tables managed by any catalog. The SiliconAngle piece also notes Snowflake's cross-engine governance story: apply a Snowflake row-access policy to a table, and that policy enforces regardless of whether the query comes from Snowflake SQL, Spark, or Trino.
Agentic Snowpark Data Engineering Best Practices with Cortex Code — Snowflake Developer Guide
The guide emphasizes four core principles: think in DataFrames (not pandas), keep computation inside Snowflake's engine, package reusable logic as UDFs, and orchestrate multi-step workflows as stored procedures rather than notebooks. It covers Cortex Code's ability to generate Snowpark Python from natural language prompts, validate and explain generated code, and execute it directly in a Snowsight Python worksheet. The agentic pattern uses Tasks DAGs to chain stored procedures, with Cortex Code monitoring and auto-refining logic when downstream data quality checks fail.
Next-Gen Data Engineering: 6 Snowflake Features Transforming How You Build Pipelines
The six features covered: Dynamic Tables for declarative pipeline definition with automatic refresh; Snowflake Tasks with DAG-based orchestration replacing external schedulers; Streams for incremental change capture; Snowpark for in-warehouse Python/Java/Scala execution; Cortex Code for AI-assisted development and automated optimization suggestions; and Data Metric Functions for built-in data quality monitoring without external tools. The article emphasizes that native orchestration via Tasks eliminates the need for Airflow in many standard pipeline patterns, reducing cost and operational overhead.
Tutorials & How-Tos
Stop Paying for Full Refreshes: How PRIMARY KEY RELY Turns 60M-Row Recomputes Into Surgical Updates
The article walks through adding PRIMARY KEY (id) RELY to an upstream dynamic table, then setting REFRESH_MODE = INCREMENTAL on downstream tables to activate row-level change tracking. The author benchmarks a 60M-row table: full refresh consumed ~12 credits per run; incremental refresh after adding PRIMARY KEY RELY consumed 0.4 credits per run — a 30x reduction. The piece also covers the QUALIFY ROW_NUMBER() = 1 pattern as an alternative for deduplication-based tables that lack natural primary keys, and explains when Snowflake falls back to full refresh automatically.
Snowflake Cost Optimization: 12 Proven Techniques to Cut Your Bill by 40% in 2026
Key techniques include: setting auto-suspend at 60 seconds or less (saves 15–25% within 24 hours); consolidating from multiple departmental warehouses to function-based warehouses (ingestion, processing, analytics); implementing AUTOMATIC_CLUSTERING only on tables with frequent range filters on clustering keys (70–90% query cost reduction reported); monitoring via SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY for credit consumption by query type; using Resource Monitors with credit quotas per warehouse; querying ACCESS_HISTORY to identify tables unused for 90+ days for archival. The article frames 2026 cost optimization as a hybrid of autonomous intelligence (Adaptive Compute) plus manual FinOps discipline.
HOW TO: Extract Structured Data from Documents via Snowflake Document AI (2026 Guide)
The guide covers: staging document files in Snowflake internal or external stages; calling AI_PARSE_DOCUMENT() with a stage reference and document type parameter to extract raw text and layout; then calling AI_EXTRACT() with a JSON schema definition to pull specific fields. Supports PDF, DOCX, PNG, JPG, and TIFF formats. The article details how to define extraction schemas as JSON objects, handle multi-page documents, and use FLATTEN() to normalize extracted arrays. It also covers the fine-tuning path using arctic-extract for domain-specific accuracy improvements. AI_PARSE_DOCUMENT is now available in AWS EU West 2 (London) as of April 2, 2026.
Use Cases & Customer Stories
Secrets of Gen AI Success: Real-World Stories of Production AI Inside Snowflake
The article features five production patterns: Retrieval-Augmented Generation (RAG) on proprietary documents using Cortex Search Service + COMPLETE(); automated report generation using SUMMARIZE() on structured query results; sentiment classification pipelines using CLASSIFY_TEXT() on customer feedback at scale; document digitization using AI_PARSE_DOCUMENT() on legacy PDFs; and medical note generation using COMPLETE() with clinical prompt templates inside a HIPAA-compliant Snowflake environment. A common theme: teams that started with small, well-governed datasets and measured output quality at each stage saw the fastest path to production.
Snowflake Summit 26 Preview: AI Moves From Experimentation to Production, Anthropic Keynotes
Summit 26 agenda: 500+ breakout sessions, 200+ on-site partners, 39 hands-on labs, 20,000+ projected in-person attendees (Snowflake's largest conference to date). Product innovations previewed for Summit include Snowflake Openflow enhancements, Adaptive Compute (automatic warehouse sizing), new agentic products on Snowflake Marketplace, Cortex AISQL, and expanded Snowflake Intelligence capabilities. Accenture and Sanofi are featured enterprise speakers alongside Anthropic's Daniela Amodei. The keynote opens Monday June 1 at 5:00 p.m. PDT.
Beyond Syntax: Optimizing Medallion Architecture with Snowflake Cortex Code — Atrium
The use case covers Bronze to Silver to Gold layer implementation using Cortex Code suggestions in Snowsight notebooks. The central data team used Cortex Code to generate Dynamic Table DDL for Silver layer transformations and Snowpark stored procedures for Gold layer business logic. Domain teams used Cortex Code to create schema-specific materializations without requiring deep Snowpark expertise. AI-assisted query profiling identified full partition scans in Bronze ingestion jobs, reducing Bronze-to-Silver latency by 40%. The article emphasizes that Cortex Code's ability to explain existing SQL — not just generate new code — accelerated onboarding of new engineers to the existing medallion schema.
Ecosystem & Industry
Snowflake vs Databricks in 2026: An Honest Comparison — Where Each Platform Actually Wins
Key 2026 comparison dimensions: SQL analytics and BI concurrency — Snowflake wins on ease of use, auto-scaling, and governance out of the box; ML and data engineering at scale — Databricks retains advantages for Spark-native, streaming-heavy, and model training workloads. Databricks reported $5.4B ARR at 65% YoY growth in February 2026. Snowflake's Cortex AI stack closes the ML gap for teams that want to avoid managing Spark clusters. Cost: Snowflake averages ~$36K/year per standard enterprise deployment vs Databricks ~$28K, though this varies heavily by workload. The analyst consensus: both platforms are converging, and the choice is increasingly about team skills and existing ecosystem integrations.
Snowflake Acquires Observe to Bring AI-Powered Observability Into the Data Cloud
Observe's platform ingests machine-generated data — logs, metrics, traces, and events — and uses AI to detect anomalies, identify root causes, and surface operational insights. Integration with Snowflake's storage and Cortex AI enables correlation between operational events and business data (e.g., linking a spike in error logs to a revenue impact metric). The acquisition expands Snowflake into the IT operations management software market, previously dominated by Splunk, Datadog, and Dynatrace. The deal is consistent with Snowflake's strategy of bringing all enterprise data types — not just analytical — into the AI Data Cloud governance perimeter.
The Unofficial Snowflake Monthly Release Notes: March 2026 — Community Perspective
Highlights from the March 2026 unofficial release notes: Snowflake Postgres GA (with strong recommendation to evaluate for app consolidation), Openflow Oracle CDC GA (flagged as high-impact for Oracle migration workloads), Performance Explorer enhancements in Snowsight (new filter presets, CSV export, side-panel search added April 17), Dynamic Table PRIMARY KEY support entering GA pipeline, and the arctic-extract fine-tuning preview entering expanded availability. The post includes practical "should I care?" ratings for each feature and links to hands-on quickstarts for the highest-rated items.
SQL Tips of the Week
Enable Incremental Refresh on Dynamic Tables with PRIMARY KEY RELY
-- Step 1: Add a primary key with RELY to your upstream base or dynamic table ALTER TABLE YOUR_DATABASE.YOUR_SCHEMA.ORDERS ADD PRIMARY KEY (order_id) RELY; -- Step 2: Create downstream dynamic table with INCREMENTAL refresh -- Snowflake will now use order_id to compute only changed rows CREATE OR REPLACE DYNAMIC TABLE YOUR_DATABASE.YOUR_SCHEMA.ORDERS_SUMMARY TARGET_LAG = '5 minutes' WAREHOUSE = YOUR_WAREHOUSE REFRESH_MODE = INCREMENTAL AS SELECT customer_id, COUNT(*) AS order_count, SUM(amount) AS total_revenue FROM YOUR_DATABASE.YOUR_SCHEMA.ORDERS GROUP BY customer_id; -- Step 3: Verify the refresh mode Snowflake actually selected SHOW DYNAMIC TABLES LIKE 'ORDERS_SUMMARY' IN SCHEMA YOUR_DATABASE.YOUR_SCHEMA; -- Check the "refresh_mode" and "refresh_mode_reason" columns
If Snowflake falls back to full refresh despite RELY, check the refresh_mode_reason column in SHOW DYNAMIC TABLES — it will tell you exactly why (e.g., unsupported SQL construct or missing key). For tables without a natural primary key, the QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) = 1 deduplication pattern also qualifies as a system-derived primary key for change tracking purposes.
Call Cortex COMPLETE() with a System Prompt for Governed LLM Responses
-- Call COMPLETE() with a system prompt to control output format and tone SELECT review_id, review_text, SNOWFLAKE.CORTEX.COMPLETE( 'mistral-large2', [ { 'role': 'system', 'content': 'You are a customer sentiment analyst. Respond ONLY with valid JSON in this format: {"sentiment": "positive|neutral|negative", "score": 0.0-1.0, "key_theme": "one phrase"}. Never include explanation or markdown.' }, { 'role': 'user', 'content': review_text } ], { 'temperature': 0, 'max_tokens': 100 } )::VARIANT AS sentiment_analysis FROM YOUR_DATABASE.YOUR_SCHEMA.CUSTOMER_REVIEWS WHERE processed_flag = FALSE LIMIT 1000; -- Parse the JSON result downstream: -- sentiment_analysis:sentiment::STRING, sentiment_analysis:score::FLOAT
Set temperature: 0 for classification and extraction tasks — you want deterministic outputs, not creative ones. For summarization or generation tasks, use 0.3–0.7. Also: batch your COMPLETE() calls in chunks of 500–1,000 rows rather than running against millions at once — this makes cost monitoring and error handling dramatically easier.
Set a Resource Monitor for AI Credit Budgets
-- Step 1: Create a resource monitor for your AI analytics warehouse CREATE OR REPLACE RESOURCE MONITOR ai_workload_monitor CREDIT_QUOTA = 500 FREQUENCY = MONTHLY START_TIMESTAMP = IMMEDIATELY TRIGGERS ON 75 PERCENT DO NOTIFY ON 90 PERCENT DO NOTIFY ON 100 PERCENT DO SUSPEND; -- Step 2: Attach monitor to your AI-specific warehouse ALTER WAREHOUSE YOUR_AI_WAREHOUSE SET RESOURCE_MONITOR = ai_workload_monitor; -- Step 3: Query current credit consumption SELECT name, credit_quota, credits_used, ROUND(credits_used / credit_quota * 100, 1) AS pct_consumed, remaining_credits FROM snowflake.account_usage.resource_monitors WHERE name = 'AI_WORKLOAD_MONITOR'; -- Step 4: See which Cortex functions consumed the most credits this month SELECT function_name, COUNT(*) AS call_count, SUM(credits_used) AS total_credits FROM snowflake.account_usage.metering_history WHERE service_type = 'AI_SERVICES' AND start_time >= DATE_TRUNC('month', CURRENT_DATE) GROUP BY function_name ORDER BY total_credits DESC;
Create separate warehouses for AI workloads vs. BI workloads and apply different resource monitors to each — this gives you clean cost separation in your METERING_HISTORY and makes chargeback conversations with business owners much simpler. For Snowflake Intelligence and Cortex Agent workloads specifically, use the new custom budget feature in Snowsight under Admin > Cost Management to set per-team spend caps with email notifications.
Extract Structured Fields from PDFs Using AI_PARSE_DOCUMENT + AI_EXTRACT
-- Step 1: Stage your PDFs in an internal or external stage -- PUT file:///local/path/invoice.pdf @YOUR_DATABASE.YOUR_SCHEMA.DOC_STAGE; -- Step 2: Parse raw text and layout from the document CREATE OR REPLACE TABLE YOUR_DATABASE.YOUR_SCHEMA.PARSED_INVOICES AS SELECT relative_path AS file_name, SNOWFLAKE.CORTEX.AI_PARSE_DOCUMENT( @YOUR_DATABASE.YOUR_SCHEMA.DOC_STAGE, relative_path, {'mode': 'LAYOUT'} ) AS parsed_doc FROM directory(@YOUR_DATABASE.YOUR_SCHEMA.DOC_STAGE) WHERE relative_path ILIKE '%.pdf'; -- Step 3: Extract structured fields using AI_EXTRACT with a JSON schema SELECT file_name, SNOWFLAKE.CORTEX.AI_EXTRACT( parsed_doc, { 'invoice_number': {'type': 'string', 'description': 'Invoice ID or number'}, 'vendor_name': {'type': 'string', 'description': 'Vendor or supplier name'}, 'invoice_date': {'type': 'string', 'description': 'Date of invoice YYYY-MM-DD'}, 'total_amount': {'type': 'number', 'description': 'Total invoice amount (numeric)'}, 'currency': {'type': 'string', 'description': 'Currency code (USD, EUR, etc.)'} } )::VARIANT AS extracted_fields FROM YOUR_DATABASE.YOUR_SCHEMA.PARSED_INVOICES; -- Step 4: Flatten and use the results as structured data -- extracted_fields:invoice_number::STRING -- extracted_fields:total_amount::FLOAT
Use 'mode': 'LAYOUT' (not 'OCR') when your documents contain tables — LAYOUT mode preserves row-column structure, which dramatically improves extraction accuracy for line-item data. For proprietary document formats where zero-shot extraction gives poor results, the arctic-extract fine-tuning preview lets you train a domain-adapted model using labeled examples from your own documents.
Govern AI Function Access with the AI_FUNCTIONS_USER Role
-- Step 1: Grant base Cortex access to all data users GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE DATA_ANALYST; -- Step 2: Grant AI function-specific access only to approved roles GRANT DATABASE ROLE SNOWFLAKE.AI_FUNCTIONS_USER TO ROLE AI_DEVELOPER; GRANT DATABASE ROLE SNOWFLAKE.AI_FUNCTIONS_USER TO ROLE DATA_SCIENTIST; -- Step 3: Verify which roles have access to AI functions SHOW GRANTS OF DATABASE ROLE SNOWFLAKE.AI_FUNCTIONS_USER; -- Step 4: Review which users are calling AI functions SELECT user_name, query_text, start_time, execution_status, ROUND(total_elapsed_time / 1000, 2) AS elapsed_seconds FROM snowflake.account_usage.query_history WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP()) AND ( query_text ILIKE '%CORTEX.COMPLETE%' OR query_text ILIKE '%AI_EXTRACT%' OR query_text ILIKE '%AI_PARSE_DOCUMENT%' OR query_text ILIKE '%CORTEX.FINETUNE%' ) ORDER BY start_time DESC;
Pair role-based AI function access with a Snowflake Row Access Policy on any table containing PII — this ensures that even users with AI_FUNCTIONS_USER access can't feed protected data into COMPLETE() by accident. Also set up an automated alert: query QUERY_HISTORY weekly for CORTEX.FINETUNE() calls — fine-tuning jobs are high-credit operations that should always be pre-approved by the platform team.
Create a Dynamic Apache Iceberg Table with PARTITION BY for Scale
-- Create a dynamic Iceberg table with optimized partitioning -- GA as of April 13, 2026 — supports PARTITION BY, TARGET_FILE_SIZE, PATH_LAYOUT CREATE OR REPLACE DYNAMIC ICEBERG TABLE YOUR_DATABASE.YOUR_SCHEMA.EVENTS_DAILY_ICEBERG TARGET_LAG = '1 hour' WAREHOUSE = YOUR_WAREHOUSE EXTERNAL_VOLUME = YOUR_EXTERNAL_VOLUME CATALOG = SNOWFLAKE BASE_LOCATION = 'events/daily/' PARTITION BY (event_date, region) TARGET_FILE_SIZE = 128MB PATH_LAYOUT = HIVE REFRESH_MODE = AUTO AS SELECT DATE_TRUNC('day', event_timestamp) AS event_date, region, event_type, user_id, COUNT(*) AS event_count, SUM(revenue) AS total_revenue FROM YOUR_DATABASE.YOUR_SCHEMA.RAW_EVENTS GROUP BY 1, 2, 3, 4; -- Check the physical layout and file statistics SELECT * FROM TABLE( INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY( TABLE_NAME => 'EVENTS_DAILY_ICEBERG' ) ) ORDER BY refresh_start_time DESC LIMIT 10;
Choose PATH_LAYOUT = HIVE if your external Spark or Trino clusters need to discover partitions via the classic region=us-east-1/event_date=2026-04-25/ path convention. Use PATH_LAYOUT = UUID if partitions are exclusively catalog-managed — UUID gives better write performance. Set TARGET_FILE_SIZE to 128MB for analytics (good balance of parallelism and open/close overhead), or 32MB if you're optimizing for streaming query latency over large scans.