Mining data on SQL query performance
We do not know if this is just us or if everyone feels the same. These days there is not much
talk about SQL queries. More importantly about the art of writing SQL queries. Unless you are stuck with
some maintenance project from the past decade. Another possibility is that you are in a data warehousing
or data lake type of engagement where SQL is so common like you saying hello to anyone walking by you
in the corridors.
When we were ruminating on these thoughts; our colleague gave a counterpoint. No-SQL databases and
many niche database storage have come and even for the roles where you deal with storing data in raw
form, SQL is no more like Hello. It is more like Bonjour for many. For many of us in the business of new oil
i.e., data we often miss out on the important data that is available in front of us. We are talking about the
data that is generated by the query plans while running SQL queries. We stand to correct the query plans
that are estimated before the query is run and the actual plan used to run the query.
Come on think about this, many go out of the way to mine data from the web. Sometimes it is ironic that
there are tutorials out there teaching how to scrap data from a website. We have come to terms with
ourselves that scrapping is not something that is sustainable and probably not the right way to sustain
going into the future. It is like oil which was initially a revolution that enabled the world to transition from the
Dark Ages to modern times. But now it is next to abuse. An abuse of the environment. Scrapping data is
similar to us. In the modern age of technology and software, we guess we should be building the web for
machines and bots as we did for humans in the last century.
Okay, enough on that.
We are not here to qualm about the practice of scraping. We raised that to the point that we go to such an
extent but we miss out on tapping into the data we already have and have been sitting on. Many legacy
applications (even the ones built a decade ago) run with SQL-based databases. If we are such a data
head, why are we not capitalising on such data? The trigger for us to go down this discussion was one
dreaded SQL query performance analysis. This was needed on a planning application that we built some
time ago. The analysis required us to drill down the query plan. We started to investigate if that data is
stored somewhere in a structured manner. Come on, we are talking about Microsoft SQL. It has been the
workhorse for enterprise data storage for decades now.
The discovery
You might question our knowledge of the internals of Microsoft SQL server. But it is something that
we feel must have been promoted in a context. Instead of being left out in dusty API documentation. Alright,
we are talking about the internal like dm_exec_cached_plans which is a dynamic management view. This
view is used by the SQL to cache the compiled query plan to be used later by the query optimiser. The view
by itself does not offer much. The companions that help this further are the management views
dm_exec_sql_text and dm_exec_query_plan. Combining them together we can get the cached query plan
in XML.
SELECT
ObjectType, CachedObjectType, Text AS QueryText, query_plan AS QueryPlanInXml
FORM
sys.dm_exec_cached_plans cp
CROSS APPLY
sys.dm_exec_sql_text(plan_handle)
CROSS APPLY
sys.dm_exec_query_plan(plan_handle)
The QueryPlanInXml will contain the XML which we can copy to our own storage. The frequency of such
copy and the need to do this will depend on what you plan to do with it later. For us it was fun. Thus, we
might be tempted to drop it there.
The hike
But think about it cache is ephemeral. If we have a copy of them, we can determine the hot paths along the
data access plane. By hot paths, we mean the indexes that are frequently accessed. Not just the indexes
but the steps that a query translates to in the plan. These steps can give insights not just into the query but
also about the data structures. We did not stop here. We took this to the next level where we wanted to
process and extract key data from the XML. We can do that in a programming language outside of SQL but
then we introduce multiple processing elements. To avoid that we tapped into the native capability of the
SQL to process XML content.
DECLARE @query_plan XML;
SET @query_plan = ( SELECT TOP 1
*
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle)
WHERE
query_plan.value('(/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/@StatementText)[1]',
'varchar(max)')
)
SELECT *
FROM OPENXML (@query_plan,
'/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/RelOp', 2)
WITH
(
PhysicalOperation varchar(50) '@PhysicalOp',
LogicalOperation varchar(50) '@LogicalOp',
EstimatedRows float '@EstimateRows',
EstimatedIO float '@EstimateIO',
AverageRowSize int '@AvgRowSize'
);
In the above query to demonstrate the possibilities, we picked up the latest query that was executed and
then fetched the interesting items from the query plan. We built all these on the query features that existed
in SQL for ages like OPENXML. Seasoned SQL developers might not find this strange but for folks like us
who are occasional developers, CROSS APPLY came across as news. For developers like us, its role is
similar to that of INNER JOIN. Instead of expecting a table or view it can work with in-memory objects like
table-valued functions and transient data structures at the other end of INNER JOIN’s.
For making a telemetry out of this idea we will have to first plug the SELECT into an INSERT INTO. We
also will have to plug this into a schedule probably as SQL Job. From this trekking, we all agree that there
is innovation and often much of design and architecture which is based on a hunch can now be powered by
data.