A question that often comes up is: Once I determine my site has a performance problem, how do I know what to fix? Since there are dozens of potential bottlenecks, how should I best decide where to spend my time?
I was reminded of this when I came across this post on performance tuning of SQL queries by Sam Saffron at Stack Overflow.
Sam talks in a lot of detail about the tools, queries, and specific optimizations he used to achieve a 10x improvement in baseline performance for some of their pages. That was useful, but not as interesting to me as his general methodology.
First of all, they were logging each query’s execution time through their own database connection class. So from the very beginning it was really easy to see how much of the page’s total load time was spent in database queries vs all the other factors (processing data, rendering, network latency, etc). Turns out in their case, as with many applications, the database access was the biggest chunk.
Once he realized this, he didn’t immediately start tweaking SQL. He looked at a month’s worth of production logs to see the impact in aggregate. Since this was one page in the system, it made sense to see the total impact over a period of time, to avoid wasting his time on a single page if there are bigger fish to fry.
Turns out over the course of 1 month, that slow page really did have a signifiant impact on site averages. So he proceeded to tune it. As an added bonus, some of the improvements he made had positive ripples through the rest of the application.
So, a couple little morals of the story:
- Measure everything! Track performance for all the key layers of your application, as well as total response times.
- Do a little cost-benefit analysis in your head before diving down rabbit holes. Before tweaking things, first determine where your time is best spent. Attack the worst bottlenecks first.
Of course, baseline performance is one thing but performance often changes unpredictably under heavy load. Most bottlenecks are non-linear. Something might do just fine for a while, and suddenly become exponentially slower once a certain load threshold is reached.