SQL Server Always On延迟观测和预警
2021/12/11 2:21:51
本文主要是介绍SQL Server Always On延迟观测和预警,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
延迟是AlwaysON的最大敌人之一。对AlwaysON而言,其首要目标就尽量减少(无法避免)主副本、辅助副本的数据延迟,实现主副本、辅助副本的“数据同步”。只有主副本、辅助副本的同步延迟越小越高,只读访问的实性性才会越高,数据库的RTO(Estimating Failover Time )和RPO(Estimating Potential Data Loss)也才会越小。
但延迟可能存在于AlwaysON同步的各个环节中,因此,在分析现延迟情况时,应该首先理解AlwaysON的同步过程,然后切分到每个过程中进行监控和分析。
AlwaysON同步的6大步骤
在我的上篇文章《AlwaysON的同步原理及同步模式》中,曾介绍过AlwaysON的同步过程。归结起来,主要包括如下六个步骤:
① log flush(primary)
② log capture(primary)
③ send(primary and secondary)
④ log receive and cache(secondary)
⑤ log hardened(secondary)
⑥ redo(secondary)
前两个步骤发生在主副本,最后三个步骤发生在辅助副本,中间的第三个步骤发生主副本和辅助副本之间。
另外,如果是同步提交模式,还需要增加一个步骤:辅助副本在步骤5之后,会发送一个(日志硬化)确认信息给主副本,然后才能进入redo阶段。
通过扩展事件跟踪,我们可以知道日志块移动的每个步骤,并且可以确切地知道事务延迟来自何处。
通常,延迟来自三个部分:
- 主库日志固化的持续时间:它等于Log_flush_start(步骤2)和Log_flush_complete(步骤3)的时间之和。
- 从库日志固化的持续时间:它等于Log_flush_start(步骤10)和Log_flush_complete(步骤11)的时间之和。
- 网络传送的持续时间 :primary:hadr_log_block_send_complete-> secondary:hadr_transport_receive_log_block_message(步骤6-7)和(secondary:hadr_lsn_send_complete-> primary:hadr_receive_harden_lsn_message(步骤12-13)的时间之和
创建扩展事件:
/* Note: this trace could generate very large amount of data very quickly, depends on the actual transaction rate. On a busy server it can grow several GB per minute, so do not run the script too long to avoid the impact to the production server. */ CREATE EVENT SESSION [AlwaysOn_Data_Movement_Tracing] ON SERVER ADD EVENT sqlserver.file_write_completed, ADD EVENT sqlserver.file_write_enqueued, ADD EVENT sqlserver.hadr_apply_log_block, ADD EVENT sqlserver.hadr_apply_vlfheader, ADD EVENT sqlserver.hadr_capture_compressed_log_cache, ADD EVENT sqlserver.hadr_capture_filestream_wait, ADD EVENT sqlserver.hadr_capture_log_block, ADD EVENT sqlserver.hadr_capture_vlfheader, ADD EVENT sqlserver.hadr_db_commit_mgr_harden, ADD EVENT sqlserver.hadr_db_commit_mgr_harden_still_waiting, ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden, ADD EVENT sqlserver.hadr_filestream_processed_block, ADD EVENT sqlserver.hadr_log_block_compression, ADD EVENT sqlserver.hadr_log_block_decompression, ADD EVENT sqlserver.hadr_log_block_group_commit , ADD EVENT sqlserver.hadr_log_block_send_complete, ADD EVENT sqlserver.hadr_lsn_send_complete, ADD EVENT sqlserver.hadr_receive_harden_lsn_message, ADD EVENT sqlserver.hadr_send_harden_lsn_message, ADD EVENT sqlserver.hadr_transport_flow_control_action, ADD EVENT sqlserver.hadr_transport_receive_log_block_message, ADD EVENT sqlserver.log_block_pushed_to_logpool, ADD EVENT sqlserver.log_flush_complete , ADD EVENT sqlserver.log_flush_start, ADD EVENT sqlserver.recovery_unit_harden_log_timestamps ADD TARGET package0.event_file(SET filename=N'c:\mslog\AlwaysOn_Data_Movement_Tracing.xel',max_file_size=(500),max_rollover_files=(4)) WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=ON ) GO
Always ON预警
IF EXISTS(SELECT 1 FROM sys.objects o WHERE o.[object_id]=OBJECT_ID('[dbo].[Proc_DBA_AlwaysonWarning]') AND o.[type] IN(N'P',N'PC')) DROP PROC [dbo].[Proc_DBA_AlwaysonWarning] GO -- ============================================= -- Description: alwayson预警 -- Remark : 所有参数均为产生预警的参数 -- ============================================= CREATE PROCEDURE dbo.Proc_DBA_AlwaysonWarning @syncMode BIT=NULL, --"同步模式" 是否为同步提交,是1否0,如为NULL则不处理。默认为 NULL (不处理是否异步) @syncStateIsFinished BIT=0, --"同步状态" 是否为 "SYNCHRONIZED",是1否0,如为NULL则不处理。默认为 0 (如同步状态为未完成则预警) @syncHealth BIT=0, --"同步健康状态" 是否为健康, 是1否0,如为NULL则不处理。默认为 0 否 (如健康状态为不健康则预警) @redoDelaySeconds INT=600, --"Redo延迟(秒)" > 多少则预警。默认为 600 (s) @logDelaySeconds INT=600, --"Log传送延迟(秒)" > 多少则预警。默认为 600 (s) @redoWaitQueueKB BIGINT=10240, --"Redo等待队列(KB)" > 多少则预警。默认为 10240 (10MB) @logWaitQueueKB BIGINT=524288 --"Log传送等待队列(KB)" > 多少则预警。默认为 524288 (512MB) AS BEGIN SET NOCOUNT ON; ;WITH t AS ( SELECT ar.replica_server_name AS [副本名称] , ar.availability_mode_desc as [同步模式], DB_NAME(dbr.database_id) AS [数据库名称] , dbr.database_state_desc AS [数据库状态], dbr.synchronization_state_desc AS [同步状态], dbr.synchronization_health_desc AS [同步健康状态], ISNULL(CASE dbr.redo_rate WHEN 0 THEN -1 ELSE CAST(dbr.redo_queue_size AS FLOAT) / dbr.redo_rate END, -1) AS [Redo延迟(秒)] , ISNULL(CASE dbr.log_send_rate WHEN 0 THEN -1 ELSE CAST(dbr.log_send_queue_size AS FLOAT) / dbr.log_send_rate END, -1) AS [Log传送延迟(秒)] , dbr.redo_queue_size AS [Redo等待队列(KB)] , dbr.redo_rate AS [Redo速率(KB/S)] , dbr.log_send_queue_size AS [Log传送等待队列(KB)] , dbr.log_send_rate AS [Log传送速率(KB/S)] FROM [master].sys.availability_replicas AS AR INNER JOIN [master].sys.dm_hadr_database_replica_states AS dbr ON ar.replica_id = dbr.replica_id WHERE dbr.redo_queue_size IS NOT NULL ) /* @syncMode BIT=NULL, --"同步模式" 是否为同步提交,是1否0,如为NULL则不处理。默认为 NULL (不处理是否异步) @syncStateIsFinished BIT=0, --"同步状态" 是否为 "SYNCHRONIZED",是1否0,如为NULL则不处理。默认为 0 (如同步状态为未完成则预警) @syncHealth BIT=0, --"同步健康状态" 是否为健康, 是1否0,如为NULL则不处理。默认为 0 否 (如健康状态为不健康则预警) @redoDelaySeconds INT=60, --"Redo延迟(秒)" > 多少则预警。默认为 60 (s) @logDelaySeconds INT=600, --"Log传送延迟(秒)" > 多少则预警。默认为 600 (s) @redoWaitQueueKB BIGINT=10240, --"Redo等待队列(KB)" > 多少则预警。默认为 10240 (10MB) @logWaitQueueKB BIGINT=524288, --"Log传送等待队列(KB)" > 多少则预警。默认为 524288 (512MB) */ SELECT CASE WHEN ( (@syncMode=0 AND [同步模式]!='SYNCHRONOUS_COMMIT') or ( @syncMode=1 AND [同步模式]='SYNCHRONOUS_COMMIT' ) ) OR ( (@syncStateIsFinished=0 AND [同步状态]!='SYNCHRONIZED') or ( @syncStateIsFinished=1 AND [同步状态]='SYNCHRONIZED' ) ) OR ( (@syncHealth=0 AND [同步健康状态]!='HEALTHY') or ( @syncHealth=1 AND [同步健康状态]='HEALTHY' ) ) OR ( [Redo延迟(秒)] > @redoDelaySeconds ) OR ( [Log传送延迟(秒)] > @logDelaySeconds ) OR ( [Redo等待队列(KB)] > @redoWaitQueueKB ) OR ( [Log传送等待队列(KB)] > @logWaitQueueKB ) THEN 1 ELSE 0 END AS Warning, [副本名称], [同步模式], [数据库名称], [数据库状态], [同步状态], [同步健康状态], [Redo延迟(秒)], [Log传送延迟(秒)], [Redo等待队列(KB)], [Redo速率(KB/S)], [Log传送等待队列(KB)], [Log传送速率(KB/S)] FROM t END GO EXEC sys.sp_addextendedproperty @name=N'Version', @value=N'2.0' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'PROCEDURE',@level1name=N'Proc_DBA_AlwaysonWarning'
这篇关于SQL Server Always On延迟观测和预警的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-01-08Docker下的SqlServer发布订阅启用
- 2023-06-05Docker安装MS SQL Server并使用Navicat远程连接
- 2023-05-25深入浅出 SQL Server CDC 数据同步
- 2023-05-12通过空间占用和执行计划了解SQL Server的行存储索引
- 2023-04-24以SQLserver为例的Dapper详细讲解
- 2022-11-30SQL server高级函数查询
- 2022-11-26SQL SERVER数据库服务器CPU不能全部利用原因分析
- 2022-11-21SQL Server 时间算差值/常用函数
- 2022-11-20调试Archery连接SQL Server提示驱动错误
- 2022-10-22SQL Server 完整、差异备份+完整、差异还原(详细讲解,规避错误)