SQL Server AAG – The Importance of WSFC Quorum Voting.

SQL Server Always on uses the Windows Server Failover Clustering(WSFC) Technology. WSFC uses a quorum-based approach(minimum number of members that must be present to make the proceedings valid) to monitoring overall cluster health and maximize node-level fault tolerance.

Each node in a WSFC cluster participates in periodic heartbeat communication to share the node’s health status with the other nodes. Unresponsive nodes are considered to be in a failed state. A quorum node set is a majority of the voting nodes and witnesses in the WSFC cluster. The overall health and status of a WSFC cluster is determined by a periodic quorum vote. 

If a WSFC cluster is set offline because of quorum failure, manual intervention is required to bring it back online.

So what voting modes can we use?

quorum mode can be configured at the WSFC cluster level, The Failover Cluster Manager utility will recommend a quorum mode based on the number of nodes in the cluster.

  • Node Majority. More than one-half of the voting nodes in the cluster must vote affirmatively for the cluster to be healthy.
  • Node and File Share Majority. Similar to Node Majority quorum mode, except that a remote file share is also configured as a voting witness, and connectivity from any node to that share is also counted as an affirmative vote. More than one-half of the possible votes must be affirmative for the cluster to be healthy.As a best practice, the witness file share should not reside on any node in the cluster, and it should be visible to all nodes in the cluster.
  • Node and Disk Majority. Similar to Node Majority quorum mode, except that a shared disk cluster resource is also designated as a voting witness, and connectivity from any node to that shared disk is also counted as an affirmative vote. More than one-half of the possible votes must be affirmative for the cluster to be healthy.
  • Disk Only. A shared disk cluster resource is designated as a witness, and connectivity by any node to that shared disk is counted as an affirmative vote.

So technically anything could have a vote? a disk, a file share, nodes…

Yes, By default, each node in the WSFC cluster is included as a member of the cluster quorum; each node has a single vote in determining the overall cluster health, and each node will continuously attempt to establish a quorum. 

So in a 2 node cluster could we just have 1 node that votes and the other isn’t allowed?

No, An individual node in a WSFC cluster cannot definitively determine that the cluster as a whole is healthy or unhealthy. At any given moment, from the perspective of each node, some of the other nodes may appear to be offline, or appear to be in the process of failover, or appear unresponsive due to a network communication failure. A key function of the quorum vote is to determine whether the apparent state of each of node in the WSFC cluster is indeed that actual state of those nodes.

So what’s the risk of only having 2 voting nodes?

When more than one voting nodes is able to establish a quorum on its own, that is known as a split-brain scenario. In such a scenario, the nodes in the separate quorums may behave differently, and in conflict with one another. this scenario is more likely when performing a forced failover; explicitly subdividing the quorum node set.

If split-brain Occurs, you’re more than likely going to see problems at AAG Level. you might see that more than 1 nodes thinks it’s primary. this can cause databases to become out of sync, in-accessible and transaction log build up might occur. eventually, if not right away you’re going to see outages.

So what’s the recommended was to achieve quorum in a 2 node cluster?

The easiest way is a file share witness. this is easily configurable, other options are;

  • Add a 3rd node(which would be relative waste of resources)
  • A shared disk (requires setting up a clustered disk)
  • A Cloud witness (Possible Security risks, requires cloud access, may incur additional costs)

Note, You’re always looking to achieve an odd number of votes. if you have more than 2 nodes in your cluster you can start looking at setting the node voting weight and you may be able to achieve a reliable quorum this way.