Query Store: time for some spring cleaning

Query Store contains a lot of information on the activity on your databases, and that can take some space (for more details, check my previous post ). We can limit the amount of data stored by adjusting its basic settings , and what data is stored by adjusting the capture mode . Once the data is stored, there are two different processes to clean up the stored data:

  • Date-based cleanup: uses the retention settings configured to decide what information will be deleted on a regular basis.
  • Size-based cleanup: if the current usage of Query Store space exceeds the 90% of its max size, a SQL internal process analyzes the information stored for all queries and starts deleting them to clear disk space. The extended event query_store_size_retention_cleanup_finished captures details on its outcome. There are no details on how this calculation is performed, but seems to be strictly CPU-based:

This automatic size-based cleanup can have a huge impact on your servers, as explained in this post entry from my colleague Mark Wilkinson titled “The perfect storm“. The summary: size-based cleanup caused a 70% CPU peak on a 8 cores server and took more than one hour to complete, having an impact on all application connecting to the server. So you can either increase the maximum size of your Query Store so the date-based cleanup will kick in before the size-based one does, or…

Manually cleaning your Query Store

The only option available to clean your Query Store seems to be clearing it completely, but the documentation presents another solution to do some fine cleanup, using some stored procedures

  • sp_query_store_remove_plan: removes a plan and related entries (runtime stats and wait stats are plan-specific and will be removed as well)
  • sp_query_store_remove_query: removes a query from Query Store (along with its plans and their runtime & wait stats).
  • sp_query_store_reset_exec_stats: deletes the statistics (runtime & waits) for a specific plan.

However, that puts the burden on the administrator, who has to decide what plans & queries can be safely removed and will have a definitive impact on the Query Store space usage. Would not make much sense to delete details from a query that will be executed right afterwards, since the plan & query details would have to be stored once again, being those the components that take the most space.

With these 3 stored procedures there are a number of custom cleanup process that can be implemented:

-Delete queries associated with a particular object, such a stored procedure used by administrators or auditors with no significant impact on the database.

-Delete all plans associated to a query that has a forced plan in place, and therefore will no longer be used by the query.

The Query Store custom cleanup script

For the issue explained in Mark’s post, I wrote an SP to clean certain queries from Query Store:

  • Any query flagged as internal, since they only contain queries such as index & statistics maintenance.
  • Any query referencing to a object no longer present in the database (“orphan” queries).
  • Ad hoc queries that had not been executed recently (less than X times in the past Y hours).

When used against the same SQL instance that had the 70% CPU peak due to the automated cleanup, it ran on 10 different database simultaneously with no noticeable impact, and reduced the space used by our databases’ Query Store from 9-10 GBs each to 6-8 GBs. With the automated size cleanup kicking once its utilization exceeded 90%, this SP was configured to run on databases that had exceeded a 80% of space usage and we haven’t had any issue since then.

There are some other functionalities in my TO DO list, such as thresholds to clean up queries (based on the ones used by the Custom Capture mode, but allowing for more parameters to configure it).

Introducing QDS Toolbox : QDSCacheCleanup

An improved version of the original procedure has been published in ChannelAdvisor’s its public GitHub repository as part of the QDS Toolbox, which I’ll be exploring in following posts. This particular component of the QDS Toolbox (QDSCacheCleanup) is designed to be deployed either on each of your databases or in a single database (they like the one you have deployed your maintenance procedures like OLA’s scripts) to perform the cleanup operation and log (if desired) the operations performed. You can install it and run it on test mode to analyze which queries would be deleted depending on the parameters you sent before deploying that on production.

The reports generated are available in multiple formats:

As tables

A summary of the estimated cleanup results
A summary of the estimated cleanup results
All the queries targeted for deletion based on the input parameters
All the queries targeted for deletion based on the input parameters

In text format (for summaries only)

**********************************
*       Stale queries found      *
**********************************
# of Queries : 5
# of Plans : 5
KBs of query texts : 0
KBs of execution plans : 390
KBs of runtime stats : 3
KBs of wait stats : 0
 
 
**********************************
*      Orphan queries found      *
**********************************
# of Queries : 59
# of Plans : 60
KBs of query texts : 65
KBs of execution plans : 3435
KBs of runtime stats : 51
KBs of wait stats : 4

Stored in SQL tables

Summary all executions, allowing you to compare the results depending on the different parameters selected
Summary all executions, allowing you to compare the results depending on the different parameters selected
List of all the queries marked for deletion, and the reason for them to be marked along with the parameters that helped make that decision
List of all the queries marked for deletion, and the reason for them to be marked along with the parameters that helped make that decision

The parameters are stored in an XML format to ease the process of adding/removing parameters in future updates of this script

<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <CleanupParameters>
    <CleanAdhocStale>0</CleanAdhocStale>
    <CleanStale>1</CleanStale>
    <Retention>24</Retention>
    <MinExecutionCount>2</MinExecutionCount>
    <CleanOrphan>1</CleanOrphan>
    <CleanInternal>1</CleanInternal>
  </CleanupParameters>
</Root>

Notes on this post

The captures were extracted from a SQL 2017 instance. The original cleanup SP has been deployed on hundreds of SQL 2017 instances and several SQL 2019 instance, and the version mentioned on in this post and published on GitHub has been developed and tested on both SQL 2017 and 2019 instances.

Notes and captures of the QDS Toolbox components may vary from the current version release, so please refer to the documentation available in GitHub and the comments in the procedures’ code