Why dbt Costs Are Hard to Track
dbt runs dozens or hundreds of models per job, each potentially on a different warehouse. Without tagging, it is impossible to tell which model consumed which credits. The result: Snowflake bills grow as dbt adoption scales, but nobody knows what to optimize first.
Tip 1: Use Incremental Models for Large Tables
Full table scans on every dbt run is the most common source of wasted compute. For any table over 10 GB, implement incremental materialization with a well-defined unique_key and efficient incremental predicate. A well-designed incremental model can reduce per-run compute by 70-90% on mature datasets.
Tip 2: Route Models to Warehouse by Weight
Not all dbt models need the same warehouse size. Use a Small or Medium warehouse for light staging and dimension models. Use Medium or Large for fact table processing. Reserve XL or 2XL for full historical loads or backfill runs only. This prevents expensive staging models from inflating credit usage on shared warehouses.
Tip 3: Tag Every Model for Cost Attribution
Add Snowflake query tags to every dbt model using the query_tag config. This makes it possible to filter QUERY_HISTORY by dbt model name, team, or domain, unlocking model-level cost attribution. Without tagging, all dbt queries appear under the warehouse with no model context.
Tip 4: Analyze run_results for Expensive Models
dbt generates a run_results.json file after every job run containing execution time per model. Parse this file in your CI/CD pipeline to detect models that suddenly take 5x longer than their baseline, a signal of query regression or data growth. Combine with Snowflake QUERY_HISTORY to correlate runtime with actual credits consumed.
Tip 5: Defer Non-Critical Refreshes to Off-Peak Windows
Not every model needs to run during business hours. Classify dbt models by business priority and defer non-critical refreshes to overnight windows when warehouse concurrency pressure is lower. This is especially relevant for staging models that feed intermediate tables but are not directly queried by dashboards.
Tip 6: Use dbt Snapshots Carefully
dbt snapshots implement SCD Type 2 by scanning source tables on every run. On large source tables this scan is expensive. Review snapshot configurations to ensure they use an efficient check_cols strategy rather than scanning all columns for changes.
Tip 7: Monitor dbt Cloud Run Concurrency
dbt Cloud schedules jobs in parallel, which can spike Snowflake warehouse concurrency. If multiple jobs run simultaneously against the same warehouse, you may trigger automatic cluster scaling without realizing it. Audit your dbt Cloud schedule to avoid concurrent large jobs on shared warehouses.
Track dbt credit spend by model with Anavsan
APEX attributes Snowflake credits to individual dbt models using query tagging and run metadata, giving data teams model-level cost accountability without manual instrumentation.