Repost (with some cleaning up since the book referenced was posted 18 years ago Read/Write Ratio versus Read/Write Ratio?.And I kind of hate that title now that I read it again.
When I wrote this post, I had a very strong belief that my OLTP databases are heavy writes and OLAP heavy reads. But in many cases this isn’t exactly the true story. Why? Because you have to read a lot of data to update, delete and display data. Even inserts have reads to look for duplicates, for example. OLAP systems should likely be read more times than write, but loading data can be a lot of data moved around and reads maybe less if the queries are efficient. When I was looking for stuff to post, this post stood out as one that I really liked (other than the aforementioned title!)
I came upon this idea when I found a post by Jason Massie (whose handle was statisticsio back then) that I had referenced, and Kendal Van Dyke alerted me this post at SQL Saturday in Atlanta (this would have been back in 2008 or 2009). After reading that (long gone) post, I wrote this post using sys.dm_db_index_usage_stats, and added references to sys.dm_io_virtual_file_stats as well.
The calculations
Using these dynamic management views, I calculated 3 values.
- Ratios based on numbers of file writes
- Ratios based on magnitude of data moved around
- Index usage/modifications.
I think this still is a good solution just to get a view of what is actually happening with your databases. It is probably not the perfect way to get all the details you need, but it is an interesting bit of data to this day.
My findings on my corporate servers were actually pretty interesting then and now. I found that while in operations my read/write ratio could be 50/50 at times, when looking at magnitude of data moved in and out of the server the ratio could be 95/5 in favor of data read.
Note too that sys.dm_io_virtual_file_stats could include things like backups and sys.dm_db_index_usage_stats could include maintenance too so it is highly expected that to get the most useful numbers that you employ some technique to capture numbers periodically and you exclude certain times of day when you might be doing non-standard work.
What this means to you may be different to every company. For example, you may only want to know the read/write characteristics during steady working hours, and not overnight when nightly processes are being run. Or you might be tuning the backup window and doing backups are you main concern.
Not sure how or if this will work with in-memory structures or filestream filegroups.
The code, I want to see the code
Note, the following code is available on my GitHub repo.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; --don't lock or honor locks--uses a like comparison to only include databases you desire (could easily be --rewritten to group by database)DECLARE @databaseName SYSNAME, @excludeLogFilesFlag bit;SET @databaseName = '%'; --'%' gives all databases together SET @excludeLogFilesFlag = 1; --excludes log files by default because they would not be --considered in the sys.dm_db_index_usage_stats numbersSET NOCOUNT ON;SELECT 'Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats'; --magnitude of data readSELECT FORMAT(CAST(SUM(divfs.num_of_bytes_read) AS decimal) / --magnitude of all bytes read or written (CAST(SUM(divfs.num_of_bytes_written) AS decimal) + CAST(SUM(divfs.num_of_bytes_read) AS decimal)),'##.####') AS RatioOfReads, --magnitude of data written FORMAT(CAST(SUM(divfs.num_of_bytes_written) AS decimal) / --magnitude of all bytes read or written (CAST(SUM(divfs.num_of_bytes_written) AS decimal) + CAST(SUM(divfs.num_of_bytes_read) AS decimal)),'##.####') AS RatioOfWrites, --if your totals come close to that format, let me know. That would be awesome! FORMAT(SUM(divfs.num_of_bytes_read + divfs.num_of_bytes_written), '###,###,###,###,###,###,###,###,###') AS TotalBytesReadAndWriten, FORMAT(SUM(divfs.num_of_bytes_read),'###,###,###,###,###,###,###,###,###') AS num_of_bytes_read, FORMAT(SUM(divfs.num_of_bytes_written), '###,###,###,###,###,###,###,###,###') as num_of_bytes_writtenFROM sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs --params database/file, but used filter --so could group multiple databases JOIN sys.master_files mf ON mf.database_id = divfs.database_id AND mf.file_id = divfs.file_idWHERE DB_NAME(divfs.database_id) LIKE @databaseName AND (mf.type_desc <> 'LOG' OR @excludeLogFilesFlag = 0);SELECT 'Ratio bases on count of operations-sys.dm_io_virtual_file_stats'; --count of read operationsSELECT FORMAT(CAST(SUM(divfs.num_of_reads) AS decimal) / --total read or write operations (CAST(SUM(divfs.num_of_writes) AS decimal) + CAST(SUM(divfs.num_of_reads) AS decimal)),'##.####') AS RatioOfReads, --count of write operations FORMAT(CAST(SUM(divfs.num_of_writes) AS decimal) / --total read or write operations (CAST(SUM(divfs.num_of_reads) AS decimal) + CAST(SUM(divfs.num_of_writes) AS decimal)),'##.####') AS RatioOfWrites, FORMAT(SUM(divfs.num_of_reads + divfs.num_of_writes), '###,###,###,###,###,###,###,###,###') AS TotalReadWriteCount, FORMAT(SUM(divfs.num_of_reads),'###,###,###,###,###,###,###,###,###') AS num_of_reads, FORMAT(SUM(divfs.num_of_writes),'###,###,###,###,###,###,###,###,###') AS num_of_writesFROM sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs JOIN sys.master_files mf ON mf.database_id = divfs.database_id AND mf.file_id = divfs.file_idWHERE DB_NAME(divfs.database_id) LIKE @databaseName AND (mf.type_desc <> 'LOG' OR @excludeLogFilesFlag = 0);SELECT 'Ratio bases on count of operations - sys.dm_db_index_usage_stats'; --number of read operations on the index.NULL of there are no read or write operationsSELECT FORMAT(case when (SUM(user_updates + user_seeks + user_scans + user_lookups) = 0) then NULL --Read operations else (CAST(SUM(user_seeks + user_scans + user_lookups) AS DECIMAL) / --all operations CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL)) end,'##.####') AS RatioOfReads, --number of read operations on the index.NULL of there are no read or write operations FORMAT(case when (SUM(user_updates + user_seeks + user_scans + user_lookups) = 0) then NULL --write operations else (CAST(SUM(user_updates) AS DECIMAL) / --all operations CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL)) end,'##.####') AS RatioOfWrites, FORMAT(SUM(user_updates + user_seeks + user_scans + user_lookups),'###,###,###,###,###,###,###,###,###') as TotalOperations, FORMAT(SUM(user_seeks + user_scans + user_lookups),'###,###,###,###,###,###,###,###,###') as TotalReadOperations, FORMAT(SUM(user_updates),'###,###,###,###,###,###,###,###,###') as TotalWriteOperations ,FORMAT(SUM(user_seeks),'###,###,###,###,###,###,###,###,###') as user_seeks, FORMAT(SUM(user_scans),'###,###,###,###,###,###,###,###,###') as user_scans, FORMAT(SUM(user_lookups),'###,###,###,###,###,###,###,###,###') as user_lookups, FORMAT(SUM(user_updates),'###,###,###,###,###,###,###,###,###') as user_updatesFROM sys.dm_db_index_usage_stats AS ddius --gives you a look at how it is usedWHERE DB_NAME(database_id) LIKE @databaseName AND index_id <> 0; --ignore heapsSELECT 'Ratio bases on count of operations - note that this isn''t exactly a 1-1 ratio of operations'UNION ALLSELECT 'because an index scan could be a lot more than an index write while a seek could be less than a write';
The test
AS a test (which was not in the original blog, I created a new database to check the data after a few simple operations.
--Uncomment if you need to reset--USE Master;--ALTER DATABASE TestReadWrite SET SINGLE_USER WITH ROLLBACK IMMEDIATE;--DROP DATABASE TestReadWrite;CREATE DATABASE TestReadWrite;GOUSE TestReadWrite;
I ran the previous script for just this new database. Result:
Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats
| RatioOfReads | RatioOfWrites | TotalBytesReadAndWriten | num_of_bytes_read | num_of_bytes_written |
|---|---|---|---|---|
| .8769 | .1231 | 2,195,456 | 1,925,120 | 270,336 |
Ratio bases on count of operations-sys.dm_io_virtual_file_stats
| RatioOfReads | RatioOfWrites | TotalReadWriteCount | num_of_reads | num_of_writes |
|---|---|---|---|---|
| .541 | .459 | 61 | 33 | 28 |
Ratio bases on count of operations - sys.dm_db_index_usage_stats
| RatioOfReads | RatioOfWrites | TotalOperations | TotalReadOperations |
|---|---|---|---|
| NULL | NULL | NULL | NULL |
| TotalWriteOperations | user_seeks | user_scans | user_lookups | user_updates |
|---|---|---|---|---|
| NULL | NULL | NULL | NULL | NULL |
Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write
You can see some activity in there, but not a lot. All NULL values for index usage, of course.
So then I wrote a set of data with 10000 rows, and a decent amount of space used:
SET NOCOUNT ON; CREATE TABLE CauseWrites( ID int IDENTITY(1,1) PRIMARY KEY, Data char(1000));GOINSERT INTO CauseWrites(Data) SELECT REPLICATE('*', 1000);GO 10000
Ran the script again, and you can see:
Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats
| RatioOfReads | RatioOfWrites | TotalBytesReadAndWriten | num_of_bytes_read | num_of_bytes_written |
|---|---|---|---|---|
| .2761 | .7239 | 8,396,800 | 2,318,336 | 6,078,464 |
Ratio bases on count of operations-sys.dm_io_virtual_file_stats
| RatioOfReads | RatioOfWrites | TotalReadWriteCount | num_of_reads | num_of_writes |
|---|---|---|---|---|
| .3171 | .6829 | 123 | 39 | 84 |
Ratio bases on count of operations - sys.dm_db_index_usage_stats
| RatioOfReads | RatioOfWrites | TotalOperations | TotalReadOperations |
|---|---|---|---|
| 0 | 1 | 10,000 | 0 |
| TotalWriteOperations | user_seeks | user_scans | user_lookups | user_updates |
|---|---|---|---|---|
| 10,000 | 0 | 0 | 0 | 10,000 |
Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write
Because an index scan could be a lot more than an index write while a seek could be less than a write. You can see that the update ratio on the index is 100% writes to the clustered index. In the next step, I will do some queries (with a seek and a scan) and show the difference in the ratios.
CREATE TABLE #HoldResultsSoNoOutput (id int, data char(1000));GO INSERT INTO #HoldResultsSoNoOutputSELECT id, dataFROM CauseWritesWHERE CAST(RAND() * 1000 AS INT) + 1 = id;GO 1000SELECT COUNT(*) FROM #HoldResultsSoNoOutput;
Ran the ratio script again, and you can see:
Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats
| RatioOfReads | RatioOfWrites | TotalBytesReadAndWriten | num_of_bytes_read | num_of_bytes_written |
|---|---|---|---|---|
| .298 | .702 | 8,658,944 | 2,580,480 | 6,078,464 |
Ratio bases on count of operations-sys.dm_io_virtual_file_stats
| RatioOfReads | RatioOfWrites | TotalReadWriteCount | num_of_reads | num_of_writes |
|---|---|---|---|---|
| .3386 | .6614 | 127 | 43 | 84 |
Ratio bases on count of operations - sys.dm_db_index_usage_stats
| RatioOfReads | RatioOfWrites | TotalOperations | TotalReadOperations |
|---|---|---|---|
| .0909 | .9091 | 11,000 | 1,000 |
| TotalWriteOperations | user_seeks | user_scans | user_lookups | user_updates |
|---|---|---|---|---|
| 10,000 | 1,000 | 0 | 0 | 10,000 |
Ratio bases on count of operations - note that this isn't exactly a 1-1 ratio of operations because an index scan could be a lot more than an index write while a seek could be less than a write
Note that you don’t see a lot of reads because once the data is cached, you won’t need to read in a lot of data. And more data fits in a page than just 1 row. Reads captured in the indexes may not be read from disk.
In conclusion
Read write ratio is an interesting metric when looking at performance. Even though you are probably more likely to write data that read it in most systems (considering if it is in memory it isn’t counted in the stats), but you will likely notice that you do a lot more reading from files than you think. Unless you create a new database on a server with no other activity. Then you may not see a lot of reads no matter how many you make.



Leave a Reply to Read-Write Ratios in SQL Server Databases – Curated SQLCancel reply