dbt + Snowflake Cost Optimization: 7 Tips for Data Engineers

Why dbt Costs Are Hard to Track

dbt runs dozens or hundreds of models per job, each potentially on a different warehouse. Without tagging, it is impossible to tell which model consumed which credits. The result: Snowflake bills grow as dbt adoption scales, but nobody knows what to optimize first.

Tip 1: Use Incremental Models for Large Tables

Full table scans on every dbt run is the most common source of wasted compute. For any table over 10 GB, implement incremental materialization with a well-defined unique_key and efficient incremental predicate. A well-designed incremental model can reduce per-run compute by 70-90% on mature datasets.

Tip 2: Route Models to Warehouse by Weight

Not all dbt models need the same warehouse size. Use a Small or Medium warehouse for light staging and dimension models. Use Medium or Large for fact table processing. Reserve XL or 2XL for full historical loads or backfill runs only. This prevents expensive staging models from inflating credit usage on shared warehouses.

Tip 3: Tag Every Model for Cost Attribution

Add Snowflake query tags to every dbt model using the query_tag config. This makes it possible to filter QUERY_HISTORY by dbt model name, team, or domain, unlocking model-level cost attribution. Without tagging, all dbt queries appear under the warehouse with no model context.

Tip 4: Analyze run_results for Expensive Models

dbt generates a run_results.json file after every job run containing execution time per model. Parse this file in your CI/CD pipeline to detect models that suddenly take 5x longer than their baseline, a signal of query regression or data growth. Combine with Snowflake QUERY_HISTORY to correlate runtime with actual credits consumed.

Tip 5: Defer Non-Critical Refreshes to Off-Peak Windows

Not every model needs to run during business hours. Classify dbt models by business priority and defer non-critical refreshes to overnight windows when warehouse concurrency pressure is lower. This is especially relevant for staging models that feed intermediate tables but are not directly queried by dashboards.

Tip 6: Use dbt Snapshots Carefully

dbt snapshots implement SCD Type 2 by scanning source tables on every run. On large source tables this scan is expensive. Review snapshot configurations to ensure they use an efficient check_cols strategy rather than scanning all columns for changes.

Tip 7: Monitor dbt Cloud Run Concurrency

dbt Cloud schedules jobs in parallel, which can spike Snowflake warehouse concurrency. If multiple jobs run simultaneously against the same warehouse, you may trigger automatic cluster scaling without realizing it. Audit your dbt Cloud schedule to avoid concurrent large jobs on shared warehouses.

Track dbt credit spend by model with Anavsan

APEX attributes Snowflake credits to individual dbt models using query tagging and run metadata, giving data teams model-level cost accountability without manual instrumentation.

Book a Demo Free Assessment

Frequently Asked Questions

dbt can drive significant credit consumption when models run as full table scans, use oversized warehouses, or execute during peak concurrency windows. Incremental models, proper warehouse routing, and query tagging are the most effective controls.

Use incremental materialization for large tables, route models to appropriately sized warehouses, add query tags for attribution, stagger job schedules, and analyze run_results.json to identify expensive models.

Add query_tag config to dbt models with model name and team identifiers. Then query QUERY_HISTORY filtered by query tag to see credits consumed per model. dbt Cloud surfaces per-model execution times in run_results.json.