Windows Server VMWare Memory Over Subscription

VM Basics…

With Virtual Servers, they all sit on a Host server, this is a physical server. These physical servers are usually massive beasts, we are talking anywhere between 128gb ram+ with 32cores etc etc… specing out these massive servers allows you to run multiple “Guest” Servers on the host. this gives you the ability to scale out the resources easier to each guest operating system.

VMware Memory Ballooning

Well… infrastructure engineers like to get value out of their resources. and when a server is allocated 10gb ram, they assume that server isn’t going to be using that allocation constantly. so they allow the VM’s to share the resources, allowing the host to reclaim resources back when they are not used & are required elsewhere.

Okay so What Is Memory Over Subscription?

It’s the process of subscribing more memory out to your guest operating systems than you physically have on the server. Think if we had a 100gb memory server. with 10guests each with 10gb of ram allocated, we have used our 100gb. but that memory wont be used at the same time, so at host level, your infrastructure engineers might see that although 100% of the ram has been allocated, only 50% of the ram is actually being used.

Value for money

This is where infrastructure guys like to make their money back, they will see it’s safe to add a couple more Guests on the host, so our 100gb server, might go up to 12guests, with 10gb each, that’s a 120gb allocation, but it’s fine because still only 60% of the ram is actually being used right?…

How does this effect SQL Server?

SQL Server, it’s pretty selfish. it doesn’t like to give memory back, but it will, and it will trim down it’s memory allocation to the minimum working set to satisfy the request of the host. it’s got no choice in the matter after all…. and this is where you will start to see performance issues, plans will start spilling to tempdb, the working set with page to disk (if you haven’t configured LPIM) it all gets rather horrible.

So how do we avoid it?

This is completely infrastructure, avoid over subscribing resources!

Memory Grant Feedback (Intelligent Query Processing) – SQL Server 2019

Row Mode Memory Grant Feedback was a new feature introduced in SQL Server 2019, it expands on the existing Batch Mode Memory Grant Feedback which was introduced in SQL Server 2017.

Okay so what is it?

After a query executes, the query’s post-execution plan includes ideal memory grant size to have all rows fit in memory. Performance suffers when memory grant sizes are incorrectly sized. Excessive grants result in wasted memory and reduced concurrency. Insufficient memory grants cause expensive spills to disk.

With Memory Grant Feedback; Consecutive executions of a query will result in SQL Server adjusting the memory grant to better allocate it’s resources, reducing excessive grants, or preventing spills to disk.

There are now two new query plan attributes will be shown for actual post-execution plans IsMemoryGrantFeedbackAdjusted and LastRequestedMemory.

LastRequestedMemory shows the granted memory in Kilobytes (KB) from the prior query execution. IsMemoryGrantFeedbackAdjusted attribute allows you to check the state of memory grant feedback for the statement within an actual query execution plan. Values surfaced in this attribute are as follows:

IsMemoryGrantFeedbackAdjusted ValueDescription
No: First ExecutionMemory grant feedback does not adjust memory for the first compile and associated execution.
No: Accurate GrantIf there is no spill to disk and the statement uses at least 50% of the granted memory, then memory grant feedback is not triggered.
No: Feedback disabledIf memory grant feedback is continually triggered and fluctuates between memory-increase and memory-decrease operations, we will disable memory grant feedback for the statement.
Yes: AdjustingMemory grant feedback has been applied and may be further adjusted for the next execution.
Yes: StableMemory grant feedback has been applied and granted memory is now stable, meaning that what was last granted for the previous execution is what was granted for the current execution.

This sounds like a great feature!

It is, for the most part. There are some things to be careful with; with parameterised query’s the the result set can change dramatically. resulting in quite large discrepancies for memory grants between executions. now as you can see “If memory grant feedback is continually triggered and fluctuates between memory-increase and memory-decrease operations, we will disable memory grant feedback for the statement” This can cause some performance issues while the the database engine is figuring this part out, so there will be some executions where memory grants are either too large, or there are large spills to disk.

Then, you need to be careful for clearing your plan cache, once you clear your plan cache, SQL will forget about all the hard work it’s done for this parametrised query, resulting in poor performance for a short time.

Cant we just disable it and save us the trouble?

You can, but it’s my view that this feature, for the moment is giving us more help than hindrance. I haven’t seen any cases yet where disabling this option would be recommended for a server… I guess there are some edge cases, where absolutely every query you have is parametrised and your result sets are fluctuating consistently… then yes this feature wouldn’t be useful! but then you would be back to regular parameter sniffing issues.

Row mode memory grant feedback can be disabled at the database or statement scope while still maintaining database compatibility level 150 and higher. To disable row mode memory grant feedback for all query executions originating from the database, execute the following within the context of the applicable database

ALTER DATABASE SCOPED CONFIGURATION SET ROW_MODE_MEMORY_GRANT_FEEDBACK = OFF;

To re-enable row mode memory grant feedback for all query executions originating from the database, execute the following within the context of the applicable database

ALTER DATABASE SCOPED CONFIGURATION SET ROW_MODE_MEMORY_GRANT_FEEDBACK = ON;

High Virtual Log File(VLF) Count in SQL Server

So what is a VLF?

The SQL Server Database Engine divides each physical log file internally into a number of virtual log files (VLFs). Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file.

Virtual log file (VLF) creation follows this method:

  • If the next growth is less than 1/8 of current log physical size, then create 1 VLF that covers the growth size (Starting with SQL Server 2014 (12.x))
  • If the next growth is more than 1/8 of the current log size, then use the pre-2014 method:
    • If growth is less than 64MB, create 4 VLFs that cover the growth size (e.g. for 1 MB growth, create four 256KB VLFs)
    • If growth is from 64MB up to 1GB, create 8 VLFs that cover the growth size (e.g. for 512MB growth, create eight 64MB VLFs)
    • If growth is larger than 1GB, create 16 VLFs that cover the growth size (e.g. for 8 GB growth, create sixteen 512MB VLFs)

Right? so how could this be an issue?

When you log grows in small increments, it’s going to create a VLF for each growth event. if you’ve grown your log by 1mb 1000 times, you’re going to have 1000 additional VLF’s. having too many VLF’s is going to cause Database Backup and Recovery tasks to slowdown, and in extreme cases it can even effect DML(Insert/Update/Delete) Performance.

So what’s ‘high’?

Well, this depends on transaction log size, 100-300 VLF’s isn’t unreasonable, and I’ve never seen performance issues with those amounts. It’s When the VLF Count starts going over 1000 you should look at sorting the issue out. I’ve only seen performance issues with the VLF Count over 1000.

How can I tell how many logs VLF’s I have?

Either of these 2 scripts should give you the information;

DBCC LOGINFO(<DatabaseName>)

SELECT [name], s.database_id,
COUNT(l.database_id) AS ‘VLF Count’,
SUM(vlf_size_mb) AS ‘VLF Size (MB)’,
SUM(CAST(vlf_active AS INT)) AS ‘Active VLF’,
SUM(vlf_activevlf_size_mb) AS ‘Active VLF Size (MB)’, COUNT(l.database_id)-SUM(CAST(vlf_active AS INT)) AS ‘In-active VLF’, SUM(vlf_size_mb)-SUM(vlf_activevlf_size_mb) AS ‘In-active VLF Size (MB)’
FROM sys.databases s
CROSS APPLY sys.dm_db_log_info(s.database_id) l
GROUP BY [name], s.database_id
ORDER BY ‘VLF Count’ DESC
GO

I’ve got 300million VLF’s how do I fix it?

Okay there might be some exaggeration there but you get the idea.

Shrinking the transaction log is the only way to reduce the number of VLF’s. but you don’t want the issue to occur again, so you will need to regrow the log back to normal operation size after the shrink is finished. Growing the log in 1 operation to the correct size will ensure SQL Server allocates the correct number of VLF’s for that size of log file(Based on the above algorithm).

Remember

Shrinking a log file is a blocking operation!

Oh! And set some reasonable growth parameters while you’re at it…. this will ensure the problem will be kept at bay

SQL Server Configuration: MODEL Database

The “Model” Database is the template database SQL Server uses when a new User Database is created, any Database Settings/Sizes are copied from the Model Database to a New blank database – Obviously if you’re doing a restore it’s going to restore with the settings the database was backed up with, so this only applies to new databases.

What can be changed?

Any database level configuration option can be changed on the Model Database, and these will be replicated to your new blank database, things like; Read Committed Snapshot Isolation Level, Collation, Compatibility level etc…

This means if you know what settings databases will need on the new instance, you can stay ahead of the curve and set them on Model. I Would advise leaving most settings as standard unless you know something is going to be specifically required for future databases

What should I change then?

Growth! Auto Growth settings is my biggest annoyance when it comes to databases, As default (< SQL Server 2016) Auto growth is set to a percentage of the data/log file. This means on small databases there are going to be lots of growth events. when you’re looking through growth history & seeing 100’s if not 1000’s of 1mb growths it makes it hard to see what happened. It’s also going to increase your Virtual Log Files(VLF’s) by a dramatic amount (which will cause performance issues)

It could also be worth setting the Initial Size of the data & log files to something sensible….

Recommendations?

Again it depends on your instance size, if it’s a small instance, I would suggest setting growth to 256 or 512. basically you want to avoid lots of growth and balance that with the datafile taking “Too much space” if it’s a 100mb database, It would be a little excessive setting auto growth to 1gb.

ALTER DATABASE Model
MODIFY FILE (NAME = ModelDEV, FILEGROWTH = 1024);
ALTER DATABASE Model
MODIFY FILE (NAME = ModelLOG, FILEGROWTH = 1024);

Anything larger I would suggest 1gb growth is fine. this can always be changed on the actual database once it’s in use, after all, remember this is just configuring the template database.

The main benefit to configuring the model database this way, is; if lots of databases have been created, all with bad auto growth settings, you’re going to have to go back and correct them all at some point; setting up all their growth parameters & shrinking the log to reduce VLF’s… it’s just time consuming work that can be avoided!

Forgotten Maintenance – SP_CYCLE_ERRORLOG

DBA’s tend to get caught up in the technical performance configuration of SQL Server (Like; Setting MaxDOP, Configuring TempDB, Installing Index Maintenance and Integrity Checks) that they sometimes forget the basics; Cycling the Error Log!

Why Bother?

Yeah sure, it’s not going to give users a performance boost, but when you’re investigating an outage, and you want to look in the error log for any issues, do you really want to open it with millions of entries? I’ve lost count of the times where I’ve had to sit there for 10mins waiting for the error log to open because it’s not been cycled in 6months+

Oh So Simple

And it’s so easy to implement error log maintenance. The error log is only cycled on 2 events;

  • When SQL Server is restarted.
  • When you execute sp_cycle_errorlog

Outside of this, without intervention it just grows, and if you’re logging user login activity it grows fast and big.

So How?

So first we want to put a limit on how many error logs we want to keep, this is basic house keeping,

  • Expand the “Management” folder.
  • Right click on “SQL Server Logs”
  • Select “Configure”
  • Check the box “Limit the number of error log files before they are recycled”
  • Pick some value to put in the “Maximum number of error log files” box
  • Click “OK

Then we want to setup a SQL Agent job to Cycle the error log, all it has to do is run EXEC sp_cycle_errorlog on a schedule which suits you, obviously if you have a particularly noisy log, you might want to schedule it more often.

Generally I limit my SQL Servers to keep 26 logs and to cycle every week, this gives us about 6 months of Error log. you might want to schedule your cycle monthly or even daily (Which is totally fine) just make sure you’re increasing or decreasing the max number of log files to suit. You want enough logs to diagnose any historic problems but you don’t want so many that they will use unnecessary disk space.

SQL Server Configuration: Fill-Factor

Fill-factor determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth. This only takes effect when an index is created or rebuilt.

For example ; Setting fill-factor to 90 means that 10 percent of each leaf level page will be left empty. This provides space for index expansion as data is added. The empty space is reserved between the index rows rather than at the end of the index.

SPACE

There are space implications with leaving this white space is that your indexes. for a 100mb index you would need 10% more space, making the index 110mb. that doesn’t sound to bad does it? what if you index was 5gb? that’s now 5.5gb for 1 index… now imagine that for all the indexes on your database….

PAGE SPLITS

By leaving this free space at the leaf-level rows there is the potential to reduce index fragmentation and page splits. Page Splits occur when there is now room within the index page for data to be written, the database engine moves approximately half of the rows on the page to a new page to make room for this new data. this is a fairly resource intensive operation which can cause an unnecessary increase to disk i/o. Trying to avoid (Too Many) page splits from occurring should be the main reason you’re setting a fill-factor.

READ PERFORMANCE

Configuring the fill-factor of an index may reduce the occurrences of page splits, it will also decrease the read performance as the database engine will have more index pages to search through. Microsoft clearly states; “a fill-factor value of 50 can cause database read performance to decrease by two times”

CONFIGURATION

Configuring a correct Fill-Factor shouldn’t be done on a mass scale. and in my opinion it shouldn’t be done at instance level. It’s very likely that it will hurt your performance more than it helps.

Workloads should be taken into account, and you should definitely be checking your performance counters to see how SQL Server is currently coping with the current workload. if a large amount of page splits are occurring then it’s worth drilling down to index level analysis to see where the page splits are occurring and setting the fill-factor on that individual index.

BUT I READ ONLINE IT CAN REDUCE PAGE SPLITS

It’s not uncommon for me to look at an instance that has been under the care of another dba and a default fill-factor has been set (Usually 80!) Other than the wasted space within the index, it could decrease your read performance by 20%!

STOP IT!

Find where your pain is, and target your pain relief there. This blanket setting is just going to cause you other issues further down the line.

SQL Server Configuration: Optimise For Ad Hoc Workloads

Optimise for Ad Hoc Workloads changes the way in which SQL Server retains Query Plans in the Plan Cache. When this option is turned on, the Database Engine stores a small compiled plan stub in the plan cache when a batch is compiled for the first time, instead of the full compiled plan. This reduces the size of the Plan Cache and stops reusable Plans being forced out of the Cache when the Cache threshold is reached. It can also helps to alleviate memory pressure due to the plan cache requiring less memory.

There are only extreme edge cases where enabling this option would have a negative performance impact; like lots of query’s compiling twice over a short period of time. In 99.9% of cases; SQL Server will benefit more from the memory you’re saving by freeing up this Cache space than the few CPU cycles you’re wasting having to compile a plan for the second time.

You can use this query to find all the single use plans in your SQL Server’s Plan Cache;

Query

SELECT objtype, cacheobjtype, AVG(usecounts) AS Avg_UseCount, SUM(refcounts) AS AllRefObjects, SUM(CAST(size_in_bytes AS bigint))/1024/1024 AS Size_MB FROM sys.dm_exec_cached_plans WHERE objtype = ‘Adhoc’ AND usecounts = 1 GROUP BY objtype, cacheobjtype;

Turning on the Optimize for ad hoc workloads Setting only affects new plans; plans that are already in the plan cache are unaffected. You would need clear the plan cache or restart SQL Server to flush out all your single use plans.

I Always turn Optimize for ad hoc workloads on by default.