MySQL MHA高可用

2021/10/5 19:11:45

编程Tag： mysql root master mha 可用 oct 2021 192.168

本文主要是介绍MySQL MHA高可用，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

MHA高可用架构介绍
MHA高可用架构原理
MHA高可用环境搭建
- MHA基础架构搭建
- - 环境规划
  - 环境准备
  - - 配置一主双从环境
    - 配置关键程序软链接
    - 配置各节点互信
  - 安装软件
  - 配置文件
  - 状态检查
  - 开启MHA
  - 查看MHA
- MHA环境故障模拟及修复
- - 故障模拟
  - 故障修复
- MHA的 VIP 功能
- MHA Send_Report
- MHA Binlog Server
管理员在高可用架构维护的职责

MHA高可用架构介绍

MHA（Master High Availability）是一款开源的 MySQL 的高可用解决方案，基于perl语言实现，由日本DeNA公司开发。它为 MySQL 主从复制架构提供了 Automating Master Failover 功能。在MySQL故障切换过程中，MHA能做到0-30s之内自动完成数据库的故障切换操作，并且在进行故障切换过程中，MHA能最大程度保证数据库的一致性，以达到真正意义上的高可用。

MHA 高可用方案由两个部分构成：

MHA Manager（管理节点）：Manager可以独立部署在一台独立的机器上管理多个Master-Slave集群，也可以部署在一台Slave上。当Master出现故障时，它可以自动将最新数据的Slave提升为Master，然后将所有其他的Slave重新指向新的Master。整个故障切换对应用程序是完全透明的。

Manager工具包主要包括以下几个工具：
masterha_manger             启动mamager的脚本
masterha_stop	            关闭manager的脚本
masterha_check_ssh          检查MHA的SSH配置状况
masterha_check_repl         检查MySQL主从复制状况  
masterha_check_status       检测当前MHA运行状态 
masterha_master_monitor     检测master是否宕机 
masterha_master_switch      控制故障转移（自动或者手动）
masterha_conf_host          添加或删除配置的server信息

MHA Node（数据节点）：所有节点都要安装

Node工具包主要包括以下几个工具（这些工具通常由MHA Manager的脚本触发，无需人为操作）：
save_binary_logs            保存和复制master的二进制日志 
apply_diff_relay_logs       识别差异的中继日志事件并将其差异的事件应用于其他的Slave
purge_relay_logs            清除中继日志（不会阻塞SQL线程）

MHA适用场景：目前MHA主要支持一主多从的架构，要搭建MHA，要求一个复制集群必须最少有3台数据库服务器（一主二从，一台充当Master、一台充当备用Master、一台充当从库）。出于成本考虑，淘宝在此基础上进行了改造，目前淘宝开发的TMHA已经支持一主一从。

MHA高可用架构原理

MHA监控节点 (通过配置文件获取所有节点信息)
系统、网络、SSH连接性、主从状态（重点是主库）
如果主库宕机，MHA处理过程如下：
2.1 选主：
    (1) 如果判断从库(position或者GTID)，数据有差异，最接近于Master的slave，成为备选主。
    (2) 如果判断从库(position或者GTID)，数据一致，按照配置文件顺序选主。
    (3) 如果设定有权重(candidate_master=1)，按照权重强制指定备选主。
         a. 默认情况下如果一个slave落后master 100M的relay logs的话，即使有权重，也会失效。
         b. 如果check_repl_delay=0的话，即使落后很多日志，也强制选择其为备选主。
2.2 数据补偿
    (1) 当SSH能连接时，从库对比主库GTID 或者position号，立即将二进制日志保存至各个从节点并且应用(save_binary_logs脚本 )
    (2) 当SSH不能连接时，MHA无法保存二进制文件，只能进行故障转移但是丢失最新数据，可以对比从库之间的relaylog的差异(apply_diff_relay_logs脚本)
2.3 Failover
将故障节点踢出集群，备选主进行身份切换，并对外提供服务；其余从库和新主库确认新的主从关系。

其他功能：

应用透明：使用VIP机制实现应用透明
故障切换通知(send_report)
二次数据补偿(binlog_server)
自动修复主库并加入集群（待开发）

MHA高可用环境搭建

MHA基础架构搭建

环境规划

环境准备

配置一主双从环境

主库：192.168.1.5

db01 [(none)]>show processlist ;
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host              | db   | Command          | Time | State                                                         | Info             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
|  4 | root | localhost         | NULL | Query            |    0 | starting                                                      | show processlist |
|  5 | repl | 192.168.1.6:41668 | NULL | Binlog Dump GTID |   31 | Master has sent all binlog to slave; waiting for more updates | NULL             |
|  6 | repl | 192.168.1.7:38244 | NULL | Binlog Dump GTID |   27 | Master has sent all binlog to slave; waiting for more updates | NULL             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)

从库：192.168.1.6

db02 [(none)]>show slave status\G ;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.5
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 194
               Relay_Log_File: slave01-relay-bin.000002
                Relay_Log_Pos: 367
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

从库：192.168.1.7

db03 [(none)]>show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.5
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 194
               Relay_Log_File: slave02-relay-bin.000002
                Relay_Log_Pos: 367
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

配置关键程序软链接

由于MHA软件脚本中写的是命令的绝对路径/usr/bin/mysqlbinlog与/usr/bin/mysql，因此为了使用MHA，就要为mysqlbinlog和mysql可执行文件建立软链接。

[root@localhost ~]# ln -s /data/mysql/bin/mysqlbinlog    /usr/bin/mysqlbinlog
[root@localhost ~]# ln -s /data/mysql/bin/mysql          /usr/bin/mysql

配置各节点互信

各节点生成密钥

[root@master ~]# ssh-keygen -t rsa
[root@slave01 ~]# ssh-keygen -t rsa
[root@slave02 ~]# ssh-keygen -t rsa

将各节点的公钥文件汇总到一个总的认证文件authorized_keys中

[root@slave01 ~]# scp authorized_keys 192.168.1.5:/root/a1_keys
[root@slave02 ~]# scp authorized_keys 192.168.1.5:/root/a2_keys
[root@master ~]# cat /root/a1_keys >> authorized_keys 
[root@master ~]# cat /root/a2_keys >> authorized_keys

分发汇总的公钥文件

[root@master ~]# cp authorized_keys .ssh/
[root@master ~]# scp authorized_keys 192.168.1.6:/root/.ssh/
[root@master ~]# scp authorized_keys 192.168.1.7:/root/.ssh/

测试

[root@master ~]# ssh 192.168.1.5 date
Mon Oct  4 20:09:48 CST 2021
[root@master ~]# ssh 192.168.1.6 date
Mon Oct  4 20:09:51 CST 2021
[root@master ~]# ssh 192.168.1.7 date
Mon Oct  4 20:09:53 CST 2021

[root@slave01 ~]# ssh 192.168.1.5 date
Mon Oct  4 20:10:13 CST 2021
[root@slave01 ~]# ssh 192.168.1.6 date
Mon Oct  4 20:10:16 CST 2021
[root@slave01 ~]# ssh 192.168.1.7 date
Mon Oct  4 20:10:18 CST 2021

[root@slave02 ~]# ssh 192.168.1.5 date
Mon Oct  4 20:10:43 CST 2021
[root@slave02 ~]# ssh 192.168.1.6 date
Mon Oct  4 20:10:44 CST 2021
[root@slave02 ~]# ssh 192.168.1.7 date
Mon Oct  4 20:10:47 CST 2021

安装软件

下载MHA软件

[root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm

各节点安装node依赖包

[root@slave01 ~]# yum install -y perl-DBD-MySQL

各节点安装node软件

[root@master ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm 
[root@slave01 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm 
[root@slave02 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

db03节点安装manager依赖包

[root@slave02 ~]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

db03节点安装manager软件

[root@slave02 ~]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

创建MHA专用监控管理用户（主库）

db01 [(none)]>grant all privileges on *.* to mha@'192.168.1.%' identified by 'mha';

配置文件

准备manager配置文件

# 创建配置文件目录
[root@slave02 ~]# mkdir -p /etc/mha
# 创建日志目录
[root@slave02 ~]# mkdir -p /var/log/mha/mysql
# 编辑MHA配置文件
[root@slave02 ~]# vim /etc/mha/mysql.cnf
[server default]                         # 总配置信息
manager_log=/var/log/mha/mysql/manager   # 日志信息
manager_workdir=/var/log/mha/mysql       # 存放日志的目录
master_binlog_dir=/data/binlog           # 主库二进制日志的位置
user=mha                                 # 专用mha监控用户   
password=mha                             # mha密码
ping_interval=2                          # 探测节点状态的时间间隔
repl_user=repl                           # 主从复制用户  （构建新的主从环境change master to需要主从复制用户和密码）
repl_password=123                        # 主从复制密码
ssh_user=root                            # ssh连接用户    

[server1]                                # 节点1
hostname=192.168.1.5
port=3306                                  

[server2]                                # 节点2
hostname=192.168.1.6
port=3306

[server3]                                # 节点3
hostname=192.168.1.7
port=3306

主库宕机谁来接管？

所有从节点日志一致，则默认以配置文件的顺序选择新master；
从节点日志不一致，自动选择最接近主库的从库；
若某节点设置权重（candidate_master=1），权重节点优先选择。但如果该节点日志落后主库100M的话，也不会被选择；
通过设置check_repl_delay=0，关闭日志量的检查，强制选择候选节点。

额外参数：
ping_interval=1：设置监控主库发送ping包的时间间隔，尝试三次没有回应的时候自动进行failover

candidate_master=1：设为候选master，设置该参数后发送主从切换会将此库提升为主库，即使该库不是集群中事件最新的slave

check_repl_delay=0：默认情况下如果一个slave落后master 100M的relaylog的话，MHA不会选择该slave作为一个新的master（即使设置了candidate_master=1）;通过设置check_repl_delay=0，MHA触发切换在选择一个新的master 的时候会忽略复制延时

candidate_master=1 + check_repl_delay=0可以保证设置candidate_master=1一定是新的master

应用场景：两地三中心、Keepalived实现VIP

状态检查

互信状态检查

[root@slave02 ~]# masterha_check_ssh  --conf=/etc/mha/mysql.cnf 
Mon Oct  4 21:02:33 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct  4 21:02:33 2021 - [info] Reading application default configuration from /etc/mha/mysql.cnf..
Mon Oct  4 21:02:33 2021 - [info] Reading server configuration from /etc/mha/mysql.cnf..
Mon Oct  4 21:02:33 2021 - [info] Starting SSH connection tests..
Mon Oct  4 21:02:40 2021 - [debug] 
Mon Oct  4 21:02:33 2021 - [debug]  Connecting via SSH from root@192.168.1.5(192.168.1.5:22) to root@192.168.1.6(192.168.1.6:22)..
Mon Oct  4 21:02:39 2021 - [debug]   ok.
Mon Oct  4 21:02:39 2021 - [debug]  Connecting via SSH from root@192.168.1.5(192.168.1.5:22) to root@192.168.1.7(192.168.1.7:22)..
Mon Oct  4 21:02:40 2021 - [debug]   ok.
Mon Oct  4 21:02:40 2021 - [debug] 
Mon Oct  4 21:02:33 2021 - [debug]  Connecting via SSH from root@192.168.1.6(192.168.1.6:22) to root@192.168.1.5(192.168.1.5:22)..
Mon Oct  4 21:02:39 2021 - [debug]   ok.
Mon Oct  4 21:02:39 2021 - [debug]  Connecting via SSH from root@192.168.1.6(192.168.1.6:22) to root@192.168.1.7(192.168.1.7:22)..
Mon Oct  4 21:02:40 2021 - [debug]   ok.
Mon Oct  4 21:02:40 2021 - [debug] 
Mon Oct  4 21:02:34 2021 - [debug]  Connecting via SSH from root@192.168.1.7(192.168.1.7:22) to root@192.168.1.5(192.168.1.5:22)..
Mon Oct  4 21:02:39 2021 - [debug]   ok.
Mon Oct  4 21:02:39 2021 - [debug]  Connecting via SSH from root@192.168.1.7(192.168.1.7:22) to root@192.168.1.6(192.168.1.6:22)..
Mon Oct  4 21:02:40 2021 - [debug]   ok.
Mon Oct  4 21:02:40 2021 - [info] All SSH connection tests passed successfully.

主从状态检查

[root@slave02 ~]# masterha_check_repl  --conf=/etc/mha/mysql.cnf
Mon Oct  4 21:04:31 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct  4 21:04:31 2021 - [info] Reading application default configuration from /etc/mha/mysql.cnf..
Mon Oct  4 21:04:31 2021 - [info] Reading server configuration from /etc/mha/mysql.cnf..
Mon Oct  4 21:04:31 2021 - [info] MHA::MasterMonitor version 0.58.
Mon Oct  4 21:04:34 2021 - [info] GTID failover mode = 1
Mon Oct  4 21:04:34 2021 - [info] Dead Servers:
Mon Oct  4 21:04:34 2021 - [info] Alive Servers:
Mon Oct  4 21:04:34 2021 - [info]   192.168.1.5(192.168.1.5:3306)
Mon Oct  4 21:04:34 2021 - [info]   192.168.1.6(192.168.1.6:3306)
Mon Oct  4 21:04:34 2021 - [info]   192.168.1.7(192.168.1.7:3306)
Mon Oct  4 21:04:34 2021 - [info] Alive Slaves:
Mon Oct  4 21:04:34 2021 - [info]   192.168.1.6(192.168.1.6:3306)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Mon Oct  4 21:04:34 2021 - [info]     GTID ON
Mon Oct  4 21:04:34 2021 - [info]     Replicating from 192.168.1.5(192.168.1.5:3306)
Mon Oct  4 21:04:34 2021 - [info]   192.168.1.7(192.168.1.7:3306)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Mon Oct  4 21:04:34 2021 - [info]     GTID ON
Mon Oct  4 21:04:34 2021 - [info]     Replicating from 192.168.1.5(192.168.1.5:3306)
Mon Oct  4 21:04:34 2021 - [info] Current Alive Master: 192.168.1.5(192.168.1.5:3306)
Mon Oct  4 21:04:34 2021 - [info] Checking slave configurations..
Mon Oct  4 21:04:34 2021 - [info]  read_only=1 is not set on slave 192.168.1.6(192.168.1.6:3306).
Mon Oct  4 21:04:34 2021 - [info]  read_only=1 is not set on slave 192.168.1.7(192.168.1.7:3306).
Mon Oct  4 21:04:34 2021 - [info] Checking replication filtering settings..
Mon Oct  4 21:04:34 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Oct  4 21:04:34 2021 - [info]  Replication filtering check ok.
Mon Oct  4 21:04:34 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct  4 21:04:34 2021 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct  4 21:04:34 2021 - [info] HealthCheck: SSH to 192.168.1.5 is reachable.
Mon Oct  4 21:04:34 2021 - [info] 
192.168.1.5(192.168.1.5:3306) (current master)
 +--192.168.1.6(192.168.1.6:3306)
 +--192.168.1.7(192.168.1.7:3306)

Mon Oct  4 21:04:34 2021 - [info] Checking replication health on 192.168.1.6..
Mon Oct  4 21:04:34 2021 - [info]  ok.
Mon Oct  4 21:04:34 2021 - [info] Checking replication health on 192.168.1.7..
Mon Oct  4 21:04:34 2021 - [info]  ok.
Mon Oct  4 21:04:34 2021 - [warning] master_ip_failover_script is not defined.
Mon Oct  4 21:04:34 2021 - [warning] shutdown_script is not defined.
Mon Oct  4 21:04:34 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

开启MHA

db03开启MHA

[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
[1] 3035

查看MHA

[root@slave02 ~]# masterha_check_status --conf=/etc/mha/mysql.cnf
mysql (pid:3035) is running(0:PING_OK), master:192.168.1.5

[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.5 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 5     |
+---------------+-------+
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.6 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 6     |
+---------------+-------+
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.7 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 7     |
+---------------+-------

MHA环境故障模拟及修复

故障模拟

模拟master宕机

[root@master ~]# systemctl stop mysqld

查看db02主从信息

db02 [(none)]>show slave status \G 
Empty set (0.00 sec)

db02 [(none)]>show processlist;
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host              | db   | Command          | Time | State                                                         | Info             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| 35 | root | localhost         | NULL | Query            |    1 | starting                                                      | show processlist |
| 44 | repl | 192.168.1.7:53166 | NULL | Binlog Dump GTID |  150 | Master has sent all binlog to slave; waiting for more updates | NULL             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
2 rows in set (0.12 sec)

查看MHA manager主机，manager停止

[root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
mysql is stopped(2:NOT_RUNNING).

查看MHA manager日志信息

[root@slave02 ~]# tail -f /var/log/mha/mysql/manager
...
----- Failover Report -----

mysql: MySQL Master failover 192.168.1.5(192.168.1.5:3306) to 192.168.1.6(192.168.1.6:3306) succeeded

Master 192.168.1.5(192.168.1.5:3306) is down!

Check MHA Manager logs at slave02:/var/log/mha/mysql/manager for details.

Started automated(non-interactive) failover.
Selected 192.168.1.6(192.168.1.6:3306) as a new master.
192.168.1.6(192.168.1.6:3306): OK: Applying all logs succeeded.
192.168.1.7(192.168.1.7:3306): OK: Slave started, replicating from 192.168.1.6(192.168.1.6:3306)
192.168.1.6(192.168.1.6:3306): Resetting slave info succeeded.
Master failover to 192.168.1.6(192.168.1.6:3306) completed successfully.

查看MHA manager配置文件，可以发现server1已经没了

[root@slave02 ~]# vim /etc/mha/mysql.cnf 
[server default]
manager_log=/var/log/mha/mysql/manager
manager_workdir=/var/log/mha/mysql
master_binlog_dir=/data/binlog
password=mha
ping_interval=2
repl_password=123
repl_user=repl
ssh_user=root
user=mha

[server2]
hostname=192.168.1.6
port=3306

[server3]
hostname=192.168.1.7
port=3306

可见MHA已经自动处理了故障（但是只是一次性处理）.

故障修复

修复db01

[root@master ~]# systemctl start mysqld

修复主从
查看manager日志中change master to命令

[root@slave02 ~]# vim /var/log/mha/mysql/manager
CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';

在这里插入图片描述
在db01中执行该命令，并启动slave

db01 [(none)]>CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
Query OK, 0 rows affected, 2 warnings (0.16 sec)

db01 [(none)]>start slave;
Query OK, 0 rows affected (0.14 sec)

修改db03上的manager配置文件，将db01加入

[root@slave02 ~]# vim /etc/mha/mysql.cnf 
[server1]
hostname=192.168.1.5
port=3306

启动MHA

[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
[1] 7635
[root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
mysql (pid:7635) is running(0:PING_OK), master:192.168.1.6

MHA的 VIP 功能

将脚本文件放置在/usr/local/bin/目录下，并增加执行权限（master_ip_failover事先准备）

[root@slave02 ~]# mv master_ip_failover /usr/local/bin/
[root@slave02 ~]# chmod +x /usr/local/bin/master_ip_failover

master_ip_failover脚本如下：

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
my $vip = '192.168.1.100/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
    elsif ( $command eq "start" ) {

        # all arguments are passed.
        # If you manage master ip address at global catalog database,
        # activate new_master_ip here.
        # You can also grant write access (create user, set read_only=0, etc) here.
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}


sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master 
sub stop_vip() {
   `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}


sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

修改master_ip_failover配置信息

[root@slave02 ~]# vim /usr/local/bin/master_ip_failover 
my $vip = '192.168.1.10/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";

编辑manager配置文件

[root@slave02 ~]# vim /etc/mha/mysql.cnf
master_ip_failover_script=/usr/local/bin/master_ip_failover

在主库手工生成第一个vip地址
```
[root@slave01 ~]# ifconfig ens33:1 192.168.1.10/24
```
注意：第一次配置VIP时，需要在主库手工生成VIP，因为只有在故障切换时才会自动生成VIP。

重启MHA

[root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
Stopped mysql successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1
[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
[1] 8498
[root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
mysql (pid:8498) is running(0:PING_OK), master:192.168.1.6

测试VIP功能：将此时的主库db02关闭mysqld服务模拟故障，预想结果应是db01接管master，且vip跳转到db01

# 模拟故障，关闭db02 mysqld服务
[root@slave01 ~]# systemctl stop mysqld

# 查看db01是否有vip信息
[root@master ~]# ifconfig
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.10  netmask 255.255.255.0  broadcast 192.168.1.255
        ether 00:0c:29:df:f3:fd  txqueuelen 1000  (Ethernet)

# 查看db03的从库状态
db03 [(none)]>show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.5
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 194
               Relay_Log_File: slave02-relay-bin.000002
                Relay_Log_Pos: 367
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

MHA Send_Report

MHA是一次性的高可用解决方案，因此出现了宕机需要得到及时的通知。MHA提供了report_script参数进行邮件告警。

准备邮件脚本send_report放置在/usr/local/bin/目录下，并增加执行权限

[root@slave02 ~]# mv send_report /usr/local/bin/
[root@slave02 ~]# chmod +x /usr/local/bin/send_report

编辑manager配置文件，调用邮件脚本

[root@slave02 ~]# vim /etc/mha/mysql.cnf
report_script=/usr/local/bin/send_report

重启MHA

[root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &

关闭主库，模拟故障，查看警告邮件
```
[root@master ~]# systemctl stop mysqld
```

MHA Binlog Server

额外配置一台机器，记录主库所有二进制日志文件，防止SSH无法连接的情况。
注意：额外机器必须与原库版本一致，本测试采用db03机器。

编辑mha配置文件，增加binlog信息

[root@slave02 ~]# vim /etc/mha/mysql.cnf
[binlog1]
no_master=1
hostname=192.168.1.7
master_binlog_dir=/binlog

创建必要目录并授权

[root@slave02 ~]# mkdir /binlog
[root@slave02 ~]# chown -R mysql.mysql /binlog/

拉取主库binlog日志

[root@slave02 ~]# cd /binlog/      # 必须进入创建好的目录
[root@slave02 ~]# mysqlbinlog  -R --host=192.168.1.5 --user=mha --password=mha --raw  --stop-never mysql-bin.000006 &
# 注意：拉取日志的起点应该按照该目前从库已经获得到的二进制日志点为起点

重启MHA

[root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &

故障处理：主库宕机，binlog server自动停止，manager也自动停止
处理思路：

重新获取新主库的binlog到binlog server中
重新配置manager配置文件中binlog server信息
启动MHA

管理员在高可用架构维护的职责

搭建：MHA+VIP+SendReport+BinlogServer
监控及故障处理
高可用架构的优化
优化核心：尽可能降低主从的延时，让MHA花在数据补偿上的时间尽量减少；
尽量使用MySQL 5.7 版本，开启GTID模式，开启从库SQL并发复制。

这篇关于MySQL MHA高可用的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

MySQL MHA高可用

目录

MHA高可用架构介绍

MHA高可用架构原理

MHA高可用环境搭建

MHA基础架构搭建

环境规划

环境准备

配置一主双从环境

配置关键程序软链接

配置各节点互信

安装软件

配置文件

状态检查

开启MHA

查看MHA

MHA环境故障模拟及修复

故障模拟

故障修复

MHA的 VIP 功能

MHA Send_Report

MHA Binlog Server

管理员在高可用架构维护的职责

相关编程文章