Read/Write Ratio in a SQL Server database

Repost (with some cleaning up since the book referenced was posted 18 years ago Read/Write Ratio versus Read/Write Ratio?.And I kind of hate that title now that I read it again.

When I wrote this post, I had a very strong belief that my OLTP databases are heavy writes and OLAP heavy reads. But in many cases this isn’t exactly the true story. Why? Because you have to read a lot of data to update, delete and display data. Even inserts have reads to look for duplicates, for example. OLAP systems should likely be read more times than write, but loading data can be a lot of data moved around and reads maybe less if the queries are efficient. When I was looking for stuff to post, this post stood out as one that I really liked (other than the aforementioned title!)

I came upon this idea when I found a post by Jason Massie (whose handle was statisticsio back then) that I had referenced, and Kendal Van Dyke alerted me this post at SQL Saturday in Atlanta (this would have been back in 2008 or 2009). After reading that (long gone) post, I wrote this post using sys.dm_db_index_usage_stats, and added references to sys.dm_io_virtual_file_stats as well.

The calculations

Using these dynamic management views, I calculated 3 values.

  • Ratios based on numbers of file writes
  • Ratios based on magnitude of data moved around
  • Index usage/modifications.

I think this still is a good solution just to get a view of what is actually happening with your databases. It is probably not the perfect way to get all the details you need, but it is an interesting bit of data to this day.

My findings on my corporate servers were actually pretty interesting then and now. I found that while in operations my read/write ratio could be 50/50 at times, when looking at magnitude of data moved in and out of the server the ratio could be 95/5 in favor of data read.

Note too that sys.dm_io_virtual_file_stats could include things like backups and sys.dm_db_index_usage_stats could include maintenance too so it is highly expected that to get the most useful numbers that you employ some technique to capture numbers periodically and you exclude certain times of day when you might be doing non-standard work.

What this means to you may be different to every company. For example, you may only want to know the read/write characteristics during steady working hours, and not overnight when nightly processes are being run. Or you might be tuning the backup window and doing backups are you main concern.

Not sure how or if this will work with in-memory structures or filestream filegroups.

The code, I want to see the code

Note, the following code is available on my GitHub repo.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; --don't lock or honor locks
--uses a like comparison to only include databases you desire (could easily be
--rewritten to group by database)
DECLARE @databaseName SYSNAME,
@excludeLogFilesFlag bit;
SET @databaseName = '%'; --'%' gives all databases together
SET @excludeLogFilesFlag = 1; --excludes log files by default because they would not be
--considered in the sys.dm_db_index_usage_stats numbers
SET NOCOUNT ON;
SELECT 'Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats';
--magnitude of data read
SELECT FORMAT(CAST(SUM(divfs.num_of_bytes_read) AS decimal) /
--magnitude of all bytes read or written
(CAST(SUM(divfs.num_of_bytes_written) AS decimal) +
CAST(SUM(divfs.num_of_bytes_read) AS decimal)),'##.####') AS RatioOfReads,
--magnitude of data written
FORMAT(CAST(SUM(divfs.num_of_bytes_written) AS decimal) /
--magnitude of all bytes read or written
(CAST(SUM(divfs.num_of_bytes_written) AS decimal) +
CAST(SUM(divfs.num_of_bytes_read) AS decimal)),'##.####') AS RatioOfWrites,
--if your totals come close to that format, let me know. That would be awesome!
FORMAT(SUM(divfs.num_of_bytes_read + divfs.num_of_bytes_written),
'###,###,###,###,###,###,###,###,###') AS TotalBytesReadAndWriten,
FORMAT(SUM(divfs.num_of_bytes_read),'###,###,###,###,###,###,###,###,###') AS num_of_bytes_read,
FORMAT(SUM(divfs.num_of_bytes_written),
'###,###,###,###,###,###,###,###,###') as num_of_bytes_written
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs --params database/file, but used filter
--so could group multiple databases
JOIN sys.master_files mf
ON mf.database_id = divfs.database_id
AND mf.file_id = divfs.file_id
WHERE DB_NAME(divfs.database_id) LIKE @databaseName
AND (mf.type_desc <> 'LOG' OR @excludeLogFilesFlag = 0);
SELECT 'Ratio bases on count of operations-sys.dm_io_virtual_file_stats';
--count of read operations
SELECT FORMAT(CAST(SUM(divfs.num_of_reads) AS decimal) /
--total read or write operations
(CAST(SUM(divfs.num_of_writes) AS decimal) +
CAST(SUM(divfs.num_of_reads) AS decimal)),'##.####') AS RatioOfReads,
--count of write operations
FORMAT(CAST(SUM(divfs.num_of_writes) AS decimal) /
--total read or write operations
(CAST(SUM(divfs.num_of_reads) AS decimal) +
CAST(SUM(divfs.num_of_writes) AS decimal)),'##.####') AS RatioOfWrites,
FORMAT(SUM(divfs.num_of_reads + divfs.num_of_writes),
'###,###,###,###,###,###,###,###,###') AS TotalReadWriteCount,
FORMAT(SUM(divfs.num_of_reads),'###,###,###,###,###,###,###,###,###') AS num_of_reads,
FORMAT(SUM(divfs.num_of_writes),'###,###,###,###,###,###,###,###,###') AS num_of_writes
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs
JOIN sys.master_files mf
ON mf.database_id = divfs.database_id
AND mf.file_id = divfs.file_id
WHERE DB_NAME(divfs.database_id) LIKE @databaseName
AND (mf.type_desc <> 'LOG' OR @excludeLogFilesFlag = 0);
SELECT 'Ratio bases on count of operations - sys.dm_db_index_usage_stats';
--number of read operations on the index.NULL of there are no read or write operations
SELECT FORMAT(case when (SUM(user_updates + user_seeks + user_scans + user_lookups) = 0)
then NULL
--Read operations
else (CAST(SUM(user_seeks + user_scans + user_lookups) AS DECIMAL) /
--all operations
CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL))
end,'##.####') AS RatioOfReads,
--number of read operations on the index.NULL of there are no read or write operations
FORMAT(case when (SUM(user_updates + user_seeks + user_scans + user_lookups) = 0)
then NULL
--write operations
else (CAST(SUM(user_updates) AS DECIMAL) /
--all operations
CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL))
end,'##.####') AS RatioOfWrites,
FORMAT(SUM(user_updates + user_seeks + user_scans + user_lookups),'###,###,###,###,###,###,###,###,###') as TotalOperations,
FORMAT(SUM(user_seeks + user_scans + user_lookups),'###,###,###,###,###,###,###,###,###') as TotalReadOperations,
FORMAT(SUM(user_updates),'###,###,###,###,###,###,###,###,###') as TotalWriteOperations
,FORMAT(SUM(user_seeks),'###,###,###,###,###,###,###,###,###') as user_seeks,
FORMAT(SUM(user_scans),'###,###,###,###,###,###,###,###,###') as user_scans,
FORMAT(SUM(user_lookups),'###,###,###,###,###,###,###,###,###') as user_lookups,
FORMAT(SUM(user_updates),'###,###,###,###,###,###,###,###,###') as user_updates
FROM sys.dm_db_index_usage_stats AS ddius --gives you a look at how it is used
WHERE DB_NAME(database_id) LIKE @databaseName
AND index_id <> 0; --ignore heaps
SELECT 'Ratio bases on count of operations - note that this isn''t exactly a 1-1 ratio of operations'
UNION ALL
SELECT 'because an index scan could be a lot more than an index write while a seek could be less than a write';

The test

AS a test (which was not in the original blog, I created a new database to check the data after a few simple operations.

--Uncomment if you need to reset
--USE Master;
--ALTER DATABASE TestReadWrite SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
--DROP DATABASE TestReadWrite;
CREATE DATABASE TestReadWrite;
GO
USE TestReadWrite;

I ran the previous script for just this new database. Result:

Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats

RatioOfReadsRatioOfWritesTotalBytesReadAndWritennum_of_bytes_readnum_of_bytes_written
.8769.12312,195,4561,925,120270,336

Ratio bases on count of operations-sys.dm_io_virtual_file_stats

RatioOfReadsRatioOfWritesTotalReadWriteCountnum_of_readsnum_of_writes
.541.459613328

Ratio bases on count of operations - sys.dm_db_index_usage_stats

RatioOfReadsRatioOfWritesTotalOperationsTotalReadOperations
NULLNULLNULLNULL
TotalWriteOperationsuser_seeksuser_scansuser_lookupsuser_updates
NULLNULLNULLNULLNULL

Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write

You can see some activity in there, but not a lot. All NULL values for index usage, of course.

So then I wrote a set of data with 10000 rows, and a decent amount of space used:

SET NOCOUNT ON;
CREATE TABLE CauseWrites
(
ID int IDENTITY(1,1) PRIMARY KEY,
Data char(1000)
);
GO
INSERT INTO CauseWrites(Data) SELECT REPLICATE('*', 1000);
GO 10000

Ran the script again, and you can see:

Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats

RatioOfReadsRatioOfWritesTotalBytesReadAndWritennum_of_bytes_readnum_of_bytes_written
.2761.72398,396,8002,318,3366,078,464

Ratio bases on count of operations-sys.dm_io_virtual_file_stats

RatioOfReadsRatioOfWritesTotalReadWriteCountnum_of_readsnum_of_writes
.3171.68291233984

Ratio bases on count of operations - sys.dm_db_index_usage_stats

RatioOfReadsRatioOfWritesTotalOperationsTotalReadOperations
0110,0000
TotalWriteOperationsuser_seeksuser_scansuser_lookupsuser_updates
10,00000010,000

Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write

Because an index scan could be a lot more than an index write while a seek could be less than a write. You can see that the update ratio on the index is 100% writes to the clustered index. In the next step, I will do some queries (with a seek and a scan) and show the difference in the ratios.

CREATE TABLE #HoldResultsSoNoOutput (id int, data char(1000));
GO
INSERT INTO #HoldResultsSoNoOutput
SELECT id, data
FROM CauseWrites
WHERE CAST(RAND() * 1000 AS INT) + 1 = id;
GO 1000
SELECT COUNT(*)
FROM #HoldResultsSoNoOutput;

Ran the ratio script again, and you can see:

Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats

RatioOfReadsRatioOfWritesTotalBytesReadAndWritennum_of_bytes_readnum_of_bytes_written
.298.7028,658,9442,580,4806,078,464

Ratio bases on count of operations-sys.dm_io_virtual_file_stats

RatioOfReadsRatioOfWritesTotalReadWriteCountnum_of_readsnum_of_writes
.3386.66141274384

Ratio bases on count of operations - sys.dm_db_index_usage_stats

RatioOfReadsRatioOfWritesTotalOperationsTotalReadOperations
.0909.909111,0001,000
TotalWriteOperationsuser_seeksuser_scansuser_lookupsuser_updates
10,0001,0000010,000

Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write

Note that you don’t see a lot of reads because once the data is cached, you won’t need to read in a lot of data. And more data fits in a page than just 1 row. Reads captured in the indexes may not be read from disk.

In conclusion

Read write ratio is an interesting metric when looking at performance. Even though you are probably more likely to write data that read it in most systems (considering if it is in memory it isn’t counted in the stats), but you will likely notice that you do a lot more reading from files than you think. Unless you create a new database on a server with no other activity. Then you may not see a lot of reads no matter how many you make.

One response to “Read/Write Ratio in a SQL Server database”

  1. […] Louis Davidson resurrects an old article: […]

Leave a Reply to Read-Write Ratios in SQL Server Databases – Curated SQLCancel reply

I’m Louis

I have been at this database thing for a very long time, with no plans to stop.

Series: SQL Techniques You Should Know

Recents

Discover more from Drsql's Database Musings

Subscribe now to keep reading and get access to the full archive.

Continue reading