Pro Tips

Top 10 Most Expensive Snowflake Query Patterns & Fixes

Sep 28, 2025

Anavsan Product Team

Top 10 Most Expensive Snowflake Query Patterns & Fixes

🧠TL;DR

Anavsan helps teams optimize Snowflake costs by stopping credit loss at the source, simulating changes without risk, and aligning FinOps and data teams around actionable insights.

Snowflake's pay-per-second billing for compute resources is a dream for scalability, but it can quickly become a nightmare for your budget if queries aren't optimized. Many data professionals find themselves staring at unexpectedly high bills, unaware of the specific SQL patterns that are secretly draining their credits.

This post will pull back the curtain on the top 10 most expensive Snowflake query patterns we frequently observe. More importantly, we'll equip you with immediate, actionable fixes to help you rein in those costs and boost performance. Understanding these patterns is the first step towards a more cost-efficient and performant data ecosystem.

The Culprits: Expensive Query Patterns & Their Remedies

Here are the common offenders that inflate your Snowflake bill, along with practical solutions.

1. Full Table Scans on Large Tables Without Filters

The Pattern: SELECT * FROM very_large_table; or SELECT column FROM another_huge_table WHERE 1=1;
Why it's Expensive: Snowflake bills by the amount of data scanned. A full table scan on a multi-terabyte table forces the warehouse to read all that data, even if you only need a few rows.
The Fix:
- Always Filter: Use WHERE clauses to drastically reduce the data scanned (WHERE date_column = '2023-01-01').
- Leverage Clustering Keys: If your tables are clustered, filtering on the clustering key makes scans highly efficient.
- Avoid SELECT *: Only select the columns you actually need.

2. Excessive `ORDER BY` with `LIMIT` on Unclustered Data

The Pattern: SELECT * FROM large_table ORDER BY unclustered_column DESC LIMIT 100;
Why it's Expensive: Snowflake needs to sort the entire dataset before it can pick the top 100, especially if the data isn't naturally sorted (clustered) on that column. This is a highly compute-intensive operation.
The Fix:
- Cluster Tables: If ORDER BY is frequent, consider clustering your table on that column.
- Filter First: Apply strong filters before sorting to reduce the dataset size.
- Push Down Sorting: If possible, sort in a preceding CTE on a smaller subset.

3. Cross Joins (Cartesian Products)

The Pattern: SELECT * FROM table_A, table_B; or SELECT * FROM table_A CROSS JOIN table_B; (without an explicit ON clause).
Why it's Expensive: Generates every possible combination of rows between two tables. If Table A has N rows and Table B has M rows, the result is N*M rows. This explodes data volume and compute.
The Fix:
- Always Use ON: Specify join conditions (ON table_A.id = table_B.id) to link related rows.
- Review WHERE clauses for implicit joins: Ensure your WHERE clause isn't accidentally creating a cross join when you intended an inner join.

4. Complex `UNION` or `UNION ALL` on Large Datasets

The Pattern: Combining multiple large SELECT statements with UNION or UNION ALL.
Why it's Expensive: Each SELECT statement runs independently, and then the results are merged. UNION also requires a distinct sort, adding more compute overhead than UNION ALL.
The Fix:
- Use UNION ALL when possible: If duplicate rows are acceptable, UNION ALL avoids the expensive distinct sort.
- Optimize Each Subquery: Ensure each SELECT statement is as efficient as possible (filters, clustering).
- Materialize Intermediates: For very complex unions, consider creating temporary tables or CTEs to break down the problem.

5. Unnecessary `DISTINCT` on Large Columns or Datasets

The Pattern: SELECT DISTINCT large_text_column FROM huge_table; or COUNT(DISTINCT very_wide_column)
Why it's Expensive: Snowflake has to read all the data and then perform a global sort and comparison to identify unique values across the entire result set.
The Fix:
- Only use DISTINCT when essential: Are you sure you need distinct values at this stage?
- Filter First: Reduce the dataset before applying DISTINCT.
- Consider Approximate Functions: For COUNT(DISTINCT), APPROX_COUNT_DISTINCT() is much faster and cheaper if an estimate is acceptable.

6. Suboptimal Window Functions (e.g., `ROW_NUMBER() OVER (ORDER BY ...)` without partitions)

The Pattern: ROW_NUMBER() OVER (ORDER BY some_column) FROM large_table;
Why it's Expensive: Without a PARTITION BY clause, the window function operates on the entire dataset as a single partition. This requires a global sort, which is very costly.
The Fix:
- Use PARTITION BY: Always define logical partitions (PARTITION BY customer_id) to make window functions operate on smaller, more manageable subsets.
- Optimize ORDER BY within partitions: Ensure the ordering within partitions is efficient.

7. `LIKE` Clause with Leading Wildcards (`%`)

The Pattern: WHERE text_column LIKE '%search_term%'
Why it's Expensive: Snowflake cannot use indexes or search optimizations when a wildcard appears at the beginning of the search string. It forces a full scan of the column.
The Fix:
- Avoid leading wildcards: If possible, use LIKE 'search_term%' for better performance.
- External Search Indexes: For complex text searches, consider using an external search service or specialized text search features if available.
- Pre-process/Tokenize: For full-text search capabilities, consider pre-processing text into tokens.

8. Large `IN` or `NOT IN` Lists (especially with subqueries)

The Pattern: SELECT * FROM my_table WHERE id IN (SELECT large_list_of_ids FROM another_table);
Why it's Expensive: The subquery can be inefficient if it returns a massive list, and then the outer query has to compare against that large list.
The Fix:
- Use EXISTS or JOIN: Often, EXISTS or an INNER JOIN performs better than IN for large sets.
- Materialize Subquery: For extremely large and static lists, consider creating a temporary table or CTE from the subquery result.

9. Excessive Use of `QUALIFY` with Large Result Sets

The Pattern: SELECT * FROM (SELECT ..., ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) as rn FROM large_table) QUALIFY rn = 1;
Why it's Expensive: QUALIFY effectively filters a previously computed window function result. If the initial window function creates a massive intermediate result before filtering, it's already expensive.
The Fix:
- Filter Early: Try to reduce the dataset before applying complex window functions and QUALIFY.
- Partition Efficiently: Ensure your PARTITION BY clause is well-defined to reduce the size of individual partitions.

10. `TRUNCATE` or `DATE_TRUNC` on Timestamp Columns (especially without a filter)

The Pattern: SELECT DATE_TRUNC('day', timestamp_column) FROM large_table;
Why it's Expensive: Applying functions to columns in WHERE clauses (or even SELECT without filters) can prevent Snowflake from using micro-partitions effectively, leading to full scans.
The Fix:
- Avoid functions in WHERE: Instead of WHERE DATE_TRUNC('day', timestamp_column) = '2023-01-01', use WHERE timestamp_column >= '2023-01-01' AND timestamp_column < '2023-01-02'.
- Pre-compute if frequent: If you frequently need DATE_TRUNC('day', timestamp_column), consider adding a derived DATE column to your table and clustering on it.

How Anavsan Helps: Proactive Cost Prevention

Manually identifying and fixing these patterns across an entire data warehouse is a monumental task. This is where Anavsan becomes indispensable. Our AI-driven platform actively monitors your Snowflake environment, identifies these expensive query patterns before they cause significant cost overruns, and even suggests or applies optimized SQL.

Instant Query Optimization: Our AI instantly analyzes and rewrites inefficient SQL to reduce compute time and cost.
Cost Anomaly Shield: Proactively flags or pauses queries exhibiting expensive patterns, preventing bill shock.
AI Performance Simulators: Test optimization suggestions to see exact cost and performance benefits before deployment.

With Anavsan, you don't just react to high bills; you prevent them, allowing your data engineers to focus on innovation, not optimization.

FAQ: Connecting the Dots to Automation

Question	Answer
Q: Can Anavsan fix my existing inefficient queries automatically?	Yes. Anavsan’s ⚡ Instant Query Optimization analyzes your Snowflake query history, identifies instances of these 10 expensive patterns, and provides optimized, cost-efficient code suggestions you can accept with one click, saving hours of manual tuning and implementing Snowflake query optimization best practices.
Q: How does Anavsan prevent these expensive patterns from happening in the future?	Our Cost Anomaly Shield provides a proactive guardrail. It monitors real-time query execution and can block or automatically tune costly patterns like unnecessary ORDER BY on large datasets or full table scans, ensuring they never run up your bill. This is a core Snowflake cost reduction strategy.
Q: Which plan is best for a Data Engineer who just wants to optimize their own SQL?	The Individual Plan is perfect for Data Engineers. It provides the core features of Instant Query Optimization, Real-Time Query Analysis, and the Cost Anomaly Shield to immediately boost your personal query performance and efficiency, a key component of Snowflake data warehouse optimization.

Optimizing Snowflake queries is an ongoing effort, but by understanding these top 10 expensive patterns, you're well on your way to a more efficient and cost-effective data environment. Implementing these fixes manually will yield significant results, but for continuous, autonomous optimization, tools like Anavsan provide an invaluable safety net, ensuring your Snowflake spend always aligns with your business value.

Stop Paying for Slow Queries Today.

Ready to eliminate the top 10 expensive patterns and deploy autonomous Snowflake cost optimization?

Start Your 15-Day Free Trial and deploy your 24/7 AI optimization partner now.

Explore with AI

‹ Snowflake Cost Governance Framework for FinOps Data Teams

Stop Paying for Data You Don’t Use on Your Snowflake Data Cloud ›

Top 10 Most Expensive Snowflake Query Patterns & Fixes

The Culprits: Expensive Query Patterns & Their Remedies

1. Full Table Scans on Large Tables Without Filters

2. Excessive `ORDER BY` with `LIMIT` on Unclustered Data

3. Cross Joins (Cartesian Products)

4. Complex `UNION` or `UNION ALL` on Large Datasets

5. Unnecessary `DISTINCT` on Large Columns or Datasets

6. Suboptimal Window Functions (e.g., `ROW_NUMBER() OVER (ORDER BY ...)` without partitions)

7. `LIKE` Clause with Leading Wildcards (`%`)

8. Large `IN` or `NOT IN` Lists (especially with subqueries)

9. Excessive Use of `QUALIFY` with Large Result Sets

10. `TRUNCATE` or `DATE_TRUNC` on Timestamp Columns (especially without a filter)

How Anavsan Helps: Proactive Cost Prevention

FAQ: Connecting the Dots to Automation

Stop Paying for Slow Queries Today.

Now Available on Snowflake Marketplace

Now Available on Snowflake Marketplace

Now Available on Snowflake Marketplace

Top 10 Most Expensive Snowflake Query Patterns & Fixes

The Culprits: Expensive Query Patterns & Their Remedies

1. Full Table Scans on Large Tables Without Filters

2. Excessive ORDER BY with LIMIT on Unclustered Data

3. Cross Joins (Cartesian Products)

4. Complex UNION or UNION ALL on Large Datasets

5. Unnecessary DISTINCT on Large Columns or Datasets

6. Suboptimal Window Functions (e.g., ROW_NUMBER() OVER (ORDER BY ...) without partitions)

7. LIKE Clause with Leading Wildcards (%)

8. Large IN or NOT IN Lists (especially with subqueries)

9. Excessive Use of QUALIFY with Large Result Sets

10. TRUNCATE or DATE_TRUNC on Timestamp Columns (especially without a filter)

How Anavsan Helps: Proactive Cost Prevention

FAQ: Connecting the Dots to Automation

Stop Paying for Slow Queries Today.

2. Excessive `ORDER BY` with `LIMIT` on Unclustered Data

4. Complex `UNION` or `UNION ALL` on Large Datasets

5. Unnecessary `DISTINCT` on Large Columns or Datasets

6. Suboptimal Window Functions (e.g., `ROW_NUMBER() OVER (ORDER BY ...)` without partitions)

7. `LIKE` Clause with Leading Wildcards (`%`)

8. Large `IN` or `NOT IN` Lists (especially with subqueries)

9. Excessive Use of `QUALIFY` with Large Result Sets

10. `TRUNCATE` or `DATE_TRUNC` on Timestamp Columns (especially without a filter)