Read/Write Ratio in a SQL Server database

Repost (with some cleaning up since the book referenced was posted 18 years ago Read/Write Ratio versus Read/Write Ratio?.And I kind of hate that title now that I read it again.

When I wrote this post, I had a very strong belief that my OLTP databases are heavy writes and OLAP heavy reads. But in many cases this isn’t exactly the true story. Why? Because you have to read a lot of data to update, delete and display data. Even inserts have reads to look for duplicates, for example. OLAP systems should likely be read more times than write, but loading data can be a lot of data moved around and reads maybe less if the queries are efficient. When I was looking for stuff to post, this post stood out as one that I really liked (other than the aforementioned title!)

I came upon this idea when I found a post by Jason Massie (whose handle was statisticsio back then) that I had referenced, and Kendal Van Dyke alerted me this post at SQL Saturday in Atlanta (this would have been back in 2008 or 2009). After reading that (long gone) post, I wrote this post using sys.dm_db_index_usage_stats, and added references to sys.dm_io_virtual_file_stats as well.

The calculations

Using these dynamic management views, I calculated 3 values.

Ratios based on numbers of file writes
Ratios based on magnitude of data moved around
Index usage/modifications.

I think this still is a good solution just to get a view of what is actually happening with your databases. It is probably not the perfect way to get all the details you need, but it is an interesting bit of data to this day.

My findings on my corporate servers were actually pretty interesting then and now. I found that while in operations my read/write ratio could be 50/50 at times, when looking at magnitude of data moved in and out of the server the ratio could be 95/5 in favor of data read.

Note too that sys.dm_io_virtual_file_stats could include things like backups and sys.dm_db_index_usage_stats could include maintenance too so it is highly expected that to get the most useful numbers that you employ some technique to capture numbers periodically and you exclude certain times of day when you might be doing non-standard work.

What this means to you may be different to every company. For example, you may only want to know the read/write characteristics during steady working hours, and not overnight when nightly processes are being run. Or you might be tuning the backup window and doing backups are you main concern.

Not sure how or if this will work with in-memory structures or filestream filegroups.

The code, I want to see the code

Note, the following code is available on my GitHub repo.

			
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; --don't lock or honor locks
--uses a like comparison to only include databases you desire (could easily be 
--rewritten to group by database)
DECLARE @databaseName        SYSNAME,
        @excludeLogFilesFlag bit;
SET @databaseName = '%'; --'%' gives all databases together 
SET @excludeLogFilesFlag = 1; --excludes log files by default because they would not be 
                              --considered in the sys.dm_db_index_usage_stats numbers
SET NOCOUNT ON;
SELECT  'Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats';
        --magnitude of data read
SELECT  FORMAT(CAST(SUM(divfs.num_of_bytes_read) AS decimal) / 
            --magnitude of all bytes read or written
            (CAST(SUM(divfs.num_of_bytes_written) AS decimal) + 
                     CAST(SUM(divfs.num_of_bytes_read) AS decimal)),'##.####') AS RatioOfReads,
        --magnitude of data written
        FORMAT(CAST(SUM(divfs.num_of_bytes_written) AS decimal) / 
            --magnitude of all bytes read or written
                        (CAST(SUM(divfs.num_of_bytes_written) AS decimal) + 
                        CAST(SUM(divfs.num_of_bytes_read) AS decimal)),'##.####') AS RatioOfWrites,
        --if your totals come close to that format, let me know. That would be awesome!
        FORMAT(SUM(divfs.num_of_bytes_read + divfs.num_of_bytes_written),
                    '###,###,###,###,###,###,###,###,###') AS TotalBytesReadAndWriten,
        FORMAT(SUM(divfs.num_of_bytes_read),'###,###,###,###,###,###,###,###,###') AS num_of_bytes_read,
        FORMAT(SUM(divfs.num_of_bytes_written),
                          '###,###,###,###,###,###,###,###,###') as num_of_bytes_written
FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs --params database/file, but used filter 
                                                          --so could group multiple databases
        JOIN sys.master_files mf
            ON mf.database_id = divfs.database_id
                AND mf.file_id = divfs.file_id
WHERE   DB_NAME(divfs.database_id) LIKE @databaseName
    AND (mf.type_desc <> 'LOG' OR   @excludeLogFilesFlag = 0);
SELECT  'Ratio bases on count of operations-sys.dm_io_virtual_file_stats';
        --count of read operations
SELECT  FORMAT(CAST(SUM(divfs.num_of_reads) AS decimal) / 
                        --total read or write operations
                        (CAST(SUM(divfs.num_of_writes) AS decimal) + 
                        CAST(SUM(divfs.num_of_reads) AS decimal)),'##.####') AS RatioOfReads,
        --count of write operations
        FORMAT(CAST(SUM(divfs.num_of_writes) AS decimal) / 
                      --total read or write operations
                      (CAST(SUM(divfs.num_of_reads) AS decimal) + 
                       CAST(SUM(divfs.num_of_writes) AS decimal)),'##.####') AS RatioOfWrites,
        FORMAT(SUM(divfs.num_of_reads + divfs.num_of_writes),
                                  '###,###,###,###,###,###,###,###,###') AS TotalReadWriteCount,
        FORMAT(SUM(divfs.num_of_reads),'###,###,###,###,###,###,###,###,###') AS num_of_reads,
        FORMAT(SUM(divfs.num_of_writes),'###,###,###,###,###,###,###,###,###') AS num_of_writes
FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs
        JOIN sys.master_files mf
            ON mf.database_id = divfs.database_id
                AND mf.file_id = divfs.file_id
WHERE   DB_NAME(divfs.database_id) LIKE @databaseName
    AND (mf.type_desc <> 'LOG' OR   @excludeLogFilesFlag = 0);
SELECT  'Ratio bases on count of operations - sys.dm_db_index_usage_stats';
        --number of read operations on the index.NULL of there are no read or write operations
SELECT  FORMAT(case when (SUM(user_updates + user_seeks + user_scans + user_lookups) = 0)
                then NULL
             --Read operations
             else (CAST(SUM(user_seeks + user_scans + user_lookups) AS DECIMAL) / 
                       --all operations
                       CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL))
        end,'##.####') AS RatioOfReads,
         --number of read operations on the index.NULL of there are no read or write operations
        FORMAT(case when (SUM(user_updates + user_seeks + user_scans + user_lookups) = 0)
             then NULL
             --write operations
             else (CAST(SUM(user_updates) AS DECIMAL) / 
                      --all operations
                       CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL))
        end,'##.####') AS RatioOfWrites,
        FORMAT(SUM(user_updates + user_seeks + user_scans + user_lookups),'###,###,###,###,###,###,###,###,###') as TotalOperations,
        FORMAT(SUM(user_seeks + user_scans + user_lookups),'###,###,###,###,###,###,###,###,###') as TotalReadOperations,
        FORMAT(SUM(user_updates),'###,###,###,###,###,###,###,###,###') as TotalWriteOperations
        ,FORMAT(SUM(user_seeks),'###,###,###,###,###,###,###,###,###') as user_seeks,
        FORMAT(SUM(user_scans),'###,###,###,###,###,###,###,###,###') as user_scans,
        FORMAT(SUM(user_lookups),'###,###,###,###,###,###,###,###,###') as user_lookups,
        FORMAT(SUM(user_updates),'###,###,###,###,###,###,###,###,###') as user_updates
FROM    sys.dm_db_index_usage_stats AS ddius --gives you a look at how it is used
WHERE   DB_NAME(database_id) LIKE @databaseName
  AND  index_id <> 0; --ignore heaps
SELECT  'Ratio bases on count of operations - note that this isn''t exactly a 1-1 ratio of operations'
UNION ALL
SELECT  'because an index scan could be a lot more than an index write while a seek could be less than a write';

		

The test

AS a test (which was not in the original blog, I created a new database to check the data after a few simple operations.

			
--Uncomment if you need to reset
--USE Master;
--ALTER DATABASE TestReadWrite SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
--DROP DATABASE TestReadWrite;
CREATE DATABASE TestReadWrite;
GO
USE TestReadWrite;

		

I ran the previous script for just this new database. Result:

Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats

RatioOfReads	RatioOfWrites	TotalBytesReadAndWriten	num_of_bytes_read	num_of_bytes_written
.8769	.1231	2,195,456	1,925,120	270,336

Ratio bases on count of operations-sys.dm_io_virtual_file_stats

RatioOfReads	RatioOfWrites	TotalReadWriteCount	num_of_reads	num_of_writes
.541	.459	61	33	28

Ratio bases on count of operations - sys.dm_db_index_usage_stats

RatioOfReads	RatioOfWrites	TotalOperations	TotalReadOperations
NULL	NULL	NULL	NULL

TotalWriteOperations	user_seeks	user_scans	user_lookups	user_updates
NULL	NULL	NULL	NULL	NULL

Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write

You can see some activity in there, but not a lot. All NULL values for index usage, of course.

So then I wrote a set of data with 10000 rows, and a decent amount of space used:

			
SET NOCOUNT ON; 
CREATE TABLE CauseWrites
(
    ID int IDENTITY(1,1) PRIMARY KEY,
    Data char(1000)
);
GO
INSERT INTO CauseWrites(Data) SELECT REPLICATE('*', 1000);
GO 10000

		

Ran the script again, and you can see:

Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats

RatioOfReads	RatioOfWrites	TotalBytesReadAndWriten	num_of_bytes_read	num_of_bytes_written
.2761	.7239	8,396,800	2,318,336	6,078,464

Ratio bases on count of operations-sys.dm_io_virtual_file_stats

RatioOfReads	RatioOfWrites	TotalReadWriteCount	num_of_reads	num_of_writes
.3171	.6829	123	39	84

Ratio bases on count of operations - sys.dm_db_index_usage_stats

RatioOfReads	RatioOfWrites	TotalOperations	TotalReadOperations
0	1	10,000	0

TotalWriteOperations	user_seeks	user_scans	user_lookups	user_updates
10,000	0	0	0	10,000

Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write

Because an index scan could be a lot more than an index write while a seek could be less than a write. You can see that the update ratio on the index is 100% writes to the clustered index. In the next step, I will do some queries (with a seek and a scan) and show the difference in the ratios.

			
CREATE TABLE #HoldResultsSoNoOutput (id int, data char(1000));
GO 
INSERT INTO #HoldResultsSoNoOutput
SELECT id, data
FROM   CauseWrites
WHERE  CAST(RAND() * 1000 AS INT) + 1 = id;
GO 1000
SELECT COUNT(*) 
FROM #HoldResultsSoNoOutput;

		

Ran the ratio script again, and you can see:

Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats

RatioOfReads	RatioOfWrites	TotalBytesReadAndWriten	num_of_bytes_read	num_of_bytes_written
.298	.702	8,658,944	2,580,480	6,078,464

Ratio bases on count of operations-sys.dm_io_virtual_file_stats

RatioOfReads	RatioOfWrites	TotalReadWriteCount	num_of_reads	num_of_writes
.3386	.6614	127	43	84

Ratio bases on count of operations - sys.dm_db_index_usage_stats

RatioOfReads	RatioOfWrites	TotalOperations	TotalReadOperations
.0909	.9091	11,000	1,000

TotalWriteOperations	user_seeks	user_scans	user_lookups	user_updates
10,000	1,000	0	0	10,000

Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write

Note that you don’t see a lot of reads because once the data is cached, you won’t need to read in a lot of data. And more data fits in a page than just 1 row. Reads captured in the indexes may not be read from disk.

In conclusion

Read write ratio is an interesting metric when looking at performance. Even though you are probably more likely to write data that read it in most systems (considering if it is in memory it isn’t counted in the stats), but you will likely notice that you do a lot more reading from files than you think. Unless you create a new database on a server with no other activity. Then you may not see a lot of reads no matter how many you make.

Drsql's Database Musings