Avoiding subqueries in SQL statements

Optimizing a database can be one of the most difficult things to do. There are so many factors that come in to play that it's hard to gauge your results and identify the actual bottleneck. From network and hardware issues to poorly configured server settings, things can get frustrating really fast. There are far too many variables to consider as it is without having to worry about poorly written SQL statements.

Fortunately, inefficient SQL is the easiest performance problem to solve and typically produces the most dramatic results in an application that rely's heavily on a database. There are plenty of ways to write bad SQL but today I want to focus on an issue I come across regularly with colleagues, even some of the more experienced ones.

The dreaded subquery.  Boy can they be misused and expensive. Don't get me wrong, a subquery is an extremely powerful feature of modern databases but more often than not (in my experience) it is used improperly. There are some situations when a subquery is the only answer to the problem and when that is the case I'm real pumped that it exists.

What's a subquery?

A subquery is a select statement within another select statement. For example:

Select * from orders where orders.customerID IN (Select ID from customers where customers.state = 'NY')

The statement inside the ( ) will execute separately from the main query and it's results will be held in memory to be used for the original query. This query will give you all orders whose customers state is NY.

What's bad about subqueries?

Not a lot on their own, but if you're not careful they can cause serious performance problems. Depending on your goal you can quickly get into several nested sub query statements, each of which rely on the furthest inner query to complete first. If any of the queries are poorly written, say missing a limiting WHERE condition, the entire chain collapses. Subqueries do not take advantage of the database engine properly either and typically produce much slower results.

Have no fear, the GROUP BY clause is here!

In the majority of the situations that I have seen, a sub query could be replaced with a JOIN statement or with a GROUP BY clause. I don't want to get into a long discussion about how JOIN operations work but I thought it would be helpful to show an example GROUP BY statement. Most times when a sub query is used it's only because the writer is unaware of the group by clause or how to use it.

Suppose you have a database table holding a bunch of orders and you have a STATE column and an ORDER_AMOUNT column.
You want to know the total order amount per state in 2007.
This could be accomplished using a subquery in the following way:

select STATE as oState, (SELECT SUM(ORDER_AMOUNT) from OrderData
where RECEIVED_DATE between '20070101' and '20080101' and STATE = oState) as Dollars
from OrderData where RECEIVED_DATE between '20070101' and '20080101'  order by oState asc

That code will work, but the query will be inefficient. Running this query on an actual set of data (not the identical query) with around 145,000 rows took 13.09 seconds in MySql according to the query browser.

This query can be re-written using the GROUP BY clause which will take advantage of the database engine to produce the same results. The same query can be written as follows:

SELECT STATE, SUM(ORDER_AMOUNT) as DOLLARS FROM OrderData
where RECEIVED_DATE between '20070101' and '20080101'  GROUP BY STATE HAVING SUM(ORDER_AMOUNT) > 0;

In comparison, this query which produced the same result against the same set of data took .07 seconds!

That is 187 times faster. More than a little bit.

Obviously this is a drastic result, and honestly I didn't plan that or expect it but it sure illustrates my point. It's less than scientific since you don't know my environment, the actual query, or the actual data but I did try my best to make it legit. Making that change in a production environment could save an immense amount of processing time.

What's the point?

Nothing really, I just realized that a lot of people are unaware of the group by clause and all they need is to see it once to change their ways. So if you know a database beginner or a database "expert" that could use a wake up call, pass them this article and help them out.

kick it on DotNetKicks.com


Related posts

Comments

April 15. 2008 11:59 PM

pingback

Pingback from weblogs.asp.net

Interesting Finds: 2008.04.16 - gOODiDEA.NET

weblogs.asp.net

April 16. 2008 09:58 AM

pingback

Pingback from alvinashcraft.com

Dew Drop - April 16, 2008 | Alvin Ashcraft's Morning Dew

alvinashcraft.com

Add comment


(Will show your Gravatar icon)  

  Country flag

[b][/b] - [i][/i] - [u][/u]- [quote][/quote]



Live preview

October 7. 2008 06:41 AM

Search

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008