AWS EC2搭建MHA+VIP+MySQL5.7

前言

随着云服务的迅速发展，越来越多企业选择将服务托管在云服务中，在数据库领域，AWS RDS、Aliyun RDS等都是不错的选择，默认已经做了高可用，基础运维，可以为企业节省不少的运维成本。由于RDS物理数据、Root权限等其他对象对用户不开放，难免在自动化运维中有些壁垒。本文将围绕在AWS EC2结合MHA做MySQL的高可用，EC2不支持VIP，可通过绑定私有IP的方式来实现。

MHA简介

MHA是由日本MySQL专家youshimaton(现就职于Facebook公司)用Perl写的一套MySQL故障切换方案，以保障数据库的高可用性。在MySQL故障切换过程中，MHA能做到在0~30s之内实现主MySQL故障转移。
该软件由两部分组成：MHA Manager（管理节点）和MHA Node（数据节点）。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群，也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上，MHA Manager会定时探测集群中的master节点，当master出现故障时，它可以自动将最新数据的slave提升为新的master，然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

准备

操作系统：Amazon Linux
master:172.31.13.126
slave:172.31.9.182
vip:172.31.0.200
其中，使用两台机器，分别部署主库和从库，MHA默认部署在从库上，下方中的VIP通指私有IP。

系统初始化修改

修改主机名

主库修改主机名

1
2
3

[root@master ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=master

从库修改主机名

1
2
3

[root@slave ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=slave

修改时区

Amazon Linux默认安装好后英国时间，需要在主库和从库修改时区

1 2	[root@master ~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime [root@master ~]# /usr/sbin/ntpdate 0.centos.pool.ntp.org && /sbin/hwclock -w &>/dev/null

设置防火墙

主库设置，允许从库访问

1
2
3

[root@master ~]# /etc/init.d/iptables status|grep 172.31.9.182
15   ACCEPT     all  --  172.31.9.182         0.0.0.0/0           
[root@master ~]#

其中，172.31.9.182是允许从库的访问规则

从库设置，允许主库访问

1
2
3

[root@slave ~]# /etc/init.d/iptables status|grep 172.31.13.126 
15   ACCEPT     all  --  172.31.13.126        0.0.0.0/0           
[root@slave ~]#

其中，172.31.13.126是允许主库的访问规则

关闭SELINUX

由于使用的操作系统为Amazon Linux，通过查看无selinux设置。

建立SSH无密码登录

主库设置

修改服务器/etc/ssh/ssh_config文件，把参数GSSAPIAuthentication修改为no。
修改服务器/etc/ssh/sshd_config文件，把参数PasswordAuthentication修改为yes,参数PermitRootLogin修改为yes,重启ssh服务
修改服务器/etc/hosts.allow文件，允许从库连接

[root@master ~]# sed '/^#/d;/^$/d' /etc/ssh/ssh_config |grep GSSAPIAuthentication
        GSSAPIAuthentication no
[root@master ~]# sed '/^#/d;/^$/d' /etc/ssh/sshd_config |grep -E 'PasswordAuthentication|PermitRootLogin'
PermitRootLogin yes
PermitRootLogin forced-commands-only
PasswordAuthentication yes
[root@master ~]# /etc/init.d/sshd restart
停止 sshd：                                                [确定]
正在启动 sshd：                                            [确定]

使用命令ssh-keygen生成公钥，发送到从库，同时尝试在主库无密码形式登录从库，而且也要保证本机无密码登录本机

[root@master ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
43:a3:80:f1:8c:d4:5e:8d:3c:99:e5:39:39:e3:89:9a root@master
The key's randomart image is:
+--[ RSA 2048]----+
|  o. . *.        |
| . *. B..o       |
|  o.+. .X        |
|    .. = *       |
|      o S        |
|     o   .       |
|    E            |
|                 |
|                 |
+-----------------+
[root@master ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.31.9.182
The authenticity of host '172.31.9.182 (172.31.9.182)' can't be established.
ECDSA key fingerprint is 72:71:66:dc:6c:b0:31:e7:6c:77:4c:8d:32:69:e0:88.
Are you sure you want to continue connecting (yes/no)? yes
root@172.31.9.182's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@172.31.9.182'"
and check to make sure that only the key(s) you wanted were added.
[root@master ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.31.13.126
The authenticity of host '172.31.13.126 (172.31.13.126)' can't be established.
ECDSA key fingerprint is b6:ef:9f:f0:5e:fd:8f:49:ef:be:79:fb:44:ea:63:08.
Are you sure you want to continue connecting (yes/no)? yes
root@172.31.13.126's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@172.31.13.126'"
and check to make sure that only the key(s) you wanted were added.

[root@master ~]# ssh 'root@172.31.9.182'
Last login: Thu Jul 20 04:51:07 2017

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2017.03-release-notes/
[root@slave ~]# exit
logout
Connection to 172.31.9.182 closed.
[root@master ~]# ssh 'root@172.31.13.126'
Last login: Thu Jul 20 04:37:13 2017

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2017.03-release-notes/
[root@master ~]# exit
logout
Connection to 172.31.13.126 closed.

从库设置

和主库的配置相同
修改服务器/etc/ssh/ssh_config文件，把参数GSSAPIAuthentication修改为no。
修改服务器/etc/ssh/sshd_config文件，把参数PasswordAuthentication修改为yes,参数PermitRootLogin修改为yes,重启ssh服务
修改服务器/etc/hosts.allow文件，允许从库连接

[root@slave ~]# sed '/^#/d;/^$/d' /etc/ssh/ssh_config |grep GSSAPIAuthentication
        GSSAPIAuthentication no
[root@slave ~]# sed '/^#/d;/^$/d' /etc/ssh/sshd_config |grep -E 'PasswordAuthentication|PermitRootLogin'
PermitRootLogin yes
PermitRootLogin forced-commands-only
PasswordAuthentication yes
[root@slave ~]# /etc/init.d/sshd restart
停止 sshd：                                                [确定]
正在启动 sshd：                                            [确定]

使用命令ssh-keygen生成公钥，发送到主库，同时尝试在从库无密码形式登录主库，而且也要保证本机无密码登录本机

[root@slave ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
7d:57:15:e0:20:6f:6f:45:00:76:c4:ba:66:32:fd:b3 root@slave
The key's randomart image is:
+--[ RSA 2048]----+
|         . ++=ooo|
|          + +.. .|
|           o.. ..|
|         .... .. |
|        S o oo.  |
|         o *..   |
|          = .    |
|             o   |
|             Eo  |
+-----------------+
[root@slave ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
da:ea:e3:1c:fb:4b:f5:84:da:a3:47:ea:2c:fa:3c:a6 root@slave
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|                 |
|           .     |
|        S o .    |
|       o +.o     |
|      o +oo .    |
|     o=*....     |
|    EBO**o       |
+-----------------+
[root@slave ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.31.13.126
The authenticity of host '172.31.13.126 (172.31.13.126)' can't be established.
ECDSA key fingerprint is b6:ef:9f:f0:5e:fd:8f:49:ef:be:79:fb:44:ea:63:08.
Are you sure you want to continue connecting (yes/no)? yes
root@172.31.13.126's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@172.31.13.126'"
and check to make sure that only the key(s) you wanted were added.

[root@slave ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.31.9.182
The authenticity of host '172.31.9.182 (172.31.9.182)' can't be established.
ECDSA key fingerprint is 72:71:66:dc:6c:b0:31:e7:6c:77:4c:8d:32:69:e0:88.
Are you sure you want to continue connecting (yes/no)? yes
root@172.31.9.182's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@172.31.9.182'"
and check to make sure that only the key(s) you wanted were added.

[root@slave ~]# ssh 'root@172.31.13.126'
Last login: Thu Jul 20 05:36:57 2017 from 172.31.13.126

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2017.03-release-notes/
[root@master ~]# exit
logout
Connection to 172.31.13.126 closed.
[root@slave ~]# ssh 'root@172.31.9.182' 
Last login: Thu Jul 20 05:36:46 2017 from 172.31.13.126

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2017.03-release-notes/
[root@slave ~]# exit
logout
Connection to 172.31.9.182 closed.

MySQL安装

使用的是自己打包的RPM包，分别登录主库和从库，直接rpm -ivh percona-server-5.7.18-15.x86_64.rpm 即可，也可使用其他方式安装MySQL

[root@master ~]# mv /home/zhouting/percona-server-5.7.18-15.x86_64.rpm  /usr/src/
[root@master ~]# rpm -ivh /usr/src/percona-server-5.7.18-15.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:percona-server-5.7.18-15         ################################# [100%]
error reading information on service /etc/rc.d/init.d/mysqld: No such file or directory
 ERROR! MySQL (Percona Server) PID file could not be found!
Starting MySQL (Percona Server).. SUCCESS! 
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
Warning: Since password will be sent to server in plain text, use ssl connection to ensure password safety.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.

查看是否安装成功，分别登录主库和从库查看

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 16:57:43 > select version();
+---------------+
| version()     |
+---------------+
| 5.7.18-15-log |
+---------------+
1 row in set (0.00 sec)

MySQL [(none)] 16:57:47 >

MySQL互为主备

重要参数配置

主库必须包含以下参数

[root@master ~]# cat /etc/my.cnf 
[mysqld]
server-id = 1
autocommit = 1
auto_increment_increment = 1
auto_increment_offset = 2
log_bin = mysql-bin
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates

从库必须包含以下参数，注意slave需要动态设置read_only=1，

[root@slave ~]# cat /etc/my.cnf 
[mysqld]
server-id = 2
autocommit = 1
auto_increment_increment = 2
auto_increment_offset = 2
log_bin = mysql-bin
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates
relay_log_purge=0
[root@slave ~]# mysql -uroot -p -e "show variables like 'read_only'"
Enter password: 
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
[root@slave ~]# mysql -uroot -p -e "set global read_only=1"
Enter password: 
[root@slave ~]# mysql -uroot -p -e "show variables like 'read_only'"
Enter password: 
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | ON    |
+---------------+-------+
[root@slave ~]#

复制帐号建立

由于MySQL版本使用的是5.7，创建用户的方法以之前有些不同

主库创建

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:16:52 > create user 'slave01'@'172.31.9.182' identified by 'slave123456';
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 17:17:21 > grant replication slave,replication client on *.* to 'slave01'@'172.31.9.182';
Query OK, 0 rows affected (0.01 sec)

MySQL [(none)] 17:17:34 > flush privileges;
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 17:17:38 > exit

从库创建

[root@slave ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:19:24 > create user 'slave01'@'172.31.13.126' identified by 'slave123456';
Query OK, 0 rows affected (0.01 sec)

MySQL [(none)] 17:19:49 > grant replication slave,replication client on *.* to 'slave01'@'172.31.13.126';
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 17:20:02 > flush privileges;
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 17:20:05 > exit

复制帐号验证

主库登录从库

[root@master ~]# mysql -h172.31.9.182 -uslave01 -pslave123456
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:22:15 > exit

从库登录主库

[root@slave ~]# mysql -h172.31.13.126 -uslave01 -pslave123456             
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:22:57 > exit

复制关系配置

主库配置

登录主库，设置复制关系，来源于从库的复制

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 17
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:32:19 > change master to master_host='172.31.9.182',master_user='slave01',master_password='slave123456',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.04 sec)

MySQL [(none)] 17:33:03 > start slave;
Query OK, 0 rows affected (0.03 sec)

MySQL [(none)] 17:33:47 > show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.31.9.182
                  Master_User: slave01
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 810
               Relay_Log_File: master-relay-bin.000002
                Relay_Log_Pos: 1023
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 810
              Relay_Log_Space: 1231
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: a1c1d189-6b96-11e7-9825-02a13635a5ca
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: a1c1d189-6b96-11e7-9825-02a13635a5ca:1-3
            Executed_Gtid_Set: 9a7b78fc-6b96-11e7-9856-0232d9c5deea:1-17,
a1c1d189-6b96-11e7-9825-02a13635a5ca:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

ERROR: 
No query specified

MySQL [(none)] 17:33:54 >

从库配置

登录从库，设置复制关系，来源于主库的复制

[root@slave ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:34:41 > change master to master_host='172.31.13.126',master_user='slave01',master_password='slave123456',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.03 sec)

MySQL [(none)] 17:35:14 > start slave;
Query OK, 0 rows affected (0.03 sec)

MySQL [(none)] 17:35:15 > show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.31.0.139
                  Master_User: slave01
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 4455
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 4059
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 4455
              Relay_Log_Space: 4266
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 9a7b78fc-6b96-11e7-9856-0232d9c5deea
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 9a7b78fc-6b96-11e7-9856-0232d9c5deea:1-17
            Executed_Gtid_Set: 9a7b78fc-6b96-11e7-9856-0232d9c5deea:1-17,
a1c1d189-6b96-11e7-9825-02a13635a5ca:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

ERROR: 
No query specified

MySQL [(none)] 17:35:18 >

复制测试

主库登录测试

登录主库，插入测试数据

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 37
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:35:48 > create database if not exists test11;use test11;                                                           
Query OK, 1 row affected (0.00 sec)

Database changed
MySQL [test11] 17:35:52 > create table if not exists t11(id int unsigned not null auto_increment,name varchar(20),primary key(`id`));
Query OK, 0 rows affected (0.02 sec)

MySQL [test11] 17:35:55 > insert into t11 values(null,'master11');
Query OK, 1 row affected (0.01 sec)

MySQL [test11] 17:35:58 > select * from t11;
+----+----------+
| id | name     |
+----+----------+
|  1 | master11 |
+----+----------+
1 row in set (0.00 sec)

MySQL [test11] 17:36:02 >

登录从库，查看测试数据，此时数据已经复制过来

[root@slave ~]# mysql -uroot -p 
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 29
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:36:25 > select * from test11.t11;
+----+----------+
| id | name     |
+----+----------+
|  1 | master11 |
+----+----------+
1 row in set (0.00 sec)

MySQL [(none)] 17:36:30 >

从库登录测试

登录从库，插入测试数据

[root@slave ~]# mysql -uroot -p 
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 32
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:37:59 > create database if not exists test11;use test11;
Query OK, 1 row affected, 1 warning (0.01 sec)

Database changed
MySQL [test11] 17:38:03 > create table if not exists t22(id int unsigned not null auto_increment,name varchar(20),primary key(`id`));     
Query OK, 0 rows affected (0.03 sec)

MySQL [test11] 17:38:18 > insert into t22 values(null,'slave11'); 
Query OK, 1 row affected (0.00 sec)

MySQL [test11] 17:38:30 > select * from t22;
+----+---------+
| id | name    |
+----+---------+
|  2 | slave11 |
+----+---------+
1 row in set (0.00 sec)

MySQL [test11] 17:38:35 >

登录主库，查看测试数据，此时数据已经复制过来

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 38
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 17:39:09 > select * from test11.t22;
+----+---------+
| id | name    |
+----+---------+
|  2 | slave11 |
+----+---------+
1 row in set (0.00 sec)

MySQL [(none)] 17:39:15 >

MHA帐号建立

主库创建

登录主库，创建允许主库和从库连接的MHA用户信息，再分别尝试使用MHA用户登录(省略)

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 41
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 09:36:55 > create user 'mha01'@'172.31.9.182' identified by 'mha123456';
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 09:37:14 > grant all privileges on *.* to 'mha01'@'172.31.9.182';
Query OK, 0 rows affected (0.01 sec)

MySQL [(none)] 09:38:02 > create user 'mha01'@'172.31.13.126' identified by 'mha123456';
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 09:38:14 > grant all privileges on *.* to 'mha01'@'172.31.13.126';
Query OK, 0 rows affected (0.01 sec)

MySQL [(none)] 09:38:18 > flush privileges;
Query OK, 0 rows affected (0.00 sec)

从库创建

登录从库，由于复制已经把在主库创建的MHA用户信息复制了过来，尝试登录即可(省略)

MHA安装

安装依赖

分别在主库和从库上安装，执行命令：
yum -y install gcc gcc-c++ make openssl-devel perl perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles perl-Time-HiRes perl-Module-Install.noarch mailx jwhois

主库安装mha4mysql-node

MHA管理端安装在从库上，主库上只需要mha4mysql-node即可，由于MySQL使用5.7版本，MHA也使用最新的0.57版本，采用编译安装方式。下载地址：https://mega.nz/#F!G4oRjARB!SWzFS59bUv9VrKwdAeIGVw

[root@master ~]# cd /usr/src/
[root@master src]# tar -zxf mha4mysql-node-0.57.tar.gz
[root@master mha4mysql-node-0.57]# perl Makefile.PL             #编译过程省略
[root@master mha4mysql-node-0.57]# make
[root@master mha4mysql-node-0.57]# make install
[root@master mha4mysql-node-0.57]# ll /usr/local/bin/ -t        #查看安装，有以下文件则mha4mysql-node安装成功
total 1760
-r-xr-xr-x 1 root root  16381 Jul 21 09:41 apply_diff_relay_logs
-r-xr-xr-x 1 root root   8261 Jul 21 09:41 purge_relay_logs
-r-xr-xr-x 1 root root   7525 Jul 21 09:41 save_binary_logs
-r-xr-xr-x 1 root root   4807 Jul 21 09:41 filter_mysqlbinlog

从库安装mha4mysql-manager和mha4mysql-node

MHA管理端安装在从库，从库需要安装manager端和node端

[root@slave ~]# cd /usr/src/
[root@slave src]# tar -zxf mha4mysql-manager-0.57.tar.gz 
[root@slave src]# cd mha4mysql-manager-0.57
[root@slave mha4mysql-manager-0.57]# perl Makefile.PL           #编译过程省略
[root@slave mha4mysql-manager-0.57]# make
[root@slave mha4mysql-manager-0.57]# make install 
[root@slave mha4mysql-manager-0.57]# ll /usr/local/bin/ -t      #查看安装，有以下文件则mha4mysql-manager安装成功
total 1756
-r-xr-xr-x 1 root root   1779 Jul 21 09:48 masterha_check_ssh
-r-xr-xr-x 1 root root   2517 Jul 21 09:48 masterha_manager
-r-xr-xr-x 1 root root   2373 Jul 21 09:48 masterha_master_switch
-r-xr-xr-x 1 root root   5171 Jul 21 09:48 masterha_secondary_check
-r-xr-xr-x 1 root root   1995 Jul 21 09:48 masterha_check_repl
-r-xr-xr-x 1 root root   1865 Jul 21 09:48 masterha_check_status
-r-xr-xr-x 1 root root   3201 Jul 21 09:48 masterha_conf_host
-r-xr-xr-x 1 root root   2165 Jul 21 09:48 masterha_master_monitor
-r-xr-xr-x 1 root root   1739 Jul 21 09:48 masterha_stop
[root@slave mha4mysql-manager-0.57]# cd ..
[root@slave src]# tar -zxf mha4mysql-node-0.57.tar.gz 
[root@slave src]# cd mha4mysql-node-0.57
[root@slave mha4mysql-node-0.57]# perl Makefile.PL              #编译过程省略        
[root@slave mha4mysql-node-0.57]# make
[root@slave mha4mysql-node-0.57]# make install
[root@slave mha4mysql-node-0.57]# ll /usr/local/bin/ -t         #查看安装，有以下文件则mha4mysql-node安装成功
total 1800
-r-xr-xr-x 1 root root  16381 Jul 21 09:54 apply_diff_relay_logs
-r-xr-xr-x 1 root root   8261 Jul 21 09:54 purge_relay_logs
-r-xr-xr-x 1 root root   7525 Jul 21 09:54 save_binary_logs
-r-xr-xr-x 1 root root   4807 Jul 21 09:54 filter_mysqlbinlog

MHA目录结构说明

MHAManager

mhamanager工具包主要包括以下工具

[root@slave  ~]# ll /usr/local/bin/
总用量 84
-r-xr-xr-x  1 root root  1995 7月  19 11:49 masterha_check_repl         #检查MySQL复制情况
-r-xr-xr-x  1 root root  1779 7月  19 11:49 masterha_check_ssh          #检查MHA的SSH配置情况
-r-xr-xr-x  1 root root  1865 7月  19 11:49 masterha_check_status       #检测当前MHA运行状态
-r-xr-xr-x  1 root root  3201 7月  19 11:49 masterha_conf_host          #添加或删除配置的server信息
-r-xr-xr-x  1 root root  2517 7月  19 11:49 masterha_manager            #启动MHA
-r-xr-xr-x  1 root root  2165 7月  19 11:49 masterha_master_monitor     #检测Master是否宕机
-r-xr-xr-x  1 root root  2373 7月  19 11:49 masterha_master_switch      #控制故障转移，自动或者手动
-r-xr-xr-x  1 root root  5172 7月  19 11:49 masterha_secondary_check    #通过其他路由检测Master是否真的宕机
-r-xr-xr-x  1 root root  1739 7月  19 11:49 masterha_stop               #停止MHA

MHANode

mhanode工具包主要包括以下工具

[root@master  ~]# ll /usr/local/bin/
总用量 44
-r-xr-xr-x  1 root root 16371 7月  19 11:41 apply_diff_relay_logs       #识别差异日志的中继日志，并将其差异事件应用于其他Slave
-r-xr-xr-x  1 root root  4807 7月  19 11:41 filter_mysqlbinlog          #去除不必要的Rollback事件
-r-xr-xr-x  1 root root  8263 7月  19 11:41 purge_relay_logs            #删除无用的Relay log，避免延时
-r-xr-xr-x  1 root root  7525 7月  19 11:41 save_binary_logs            #保存和复制down掉的主服务器二进制日志

自定义扩展脚本说明

secondary_check_script #通过多条网络路由检测master的可用性
master_ip_failover_script #自动failover时候的切换脚本，可将vip信息写入此脚本中
shutdown_script #强制关闭master节点执行脚本
report_script #发送报告
init_conf_load_script #加载初始配置参数，如不想在配置中写明文密码
master_ip_online_change_script #手动failover时候的切换脚本

MHA扩展脚本

由于使用的AWS EC2，不支持VIP，可通过辅助IP的形式实现，这里使用脚本来辅助实现HA。

aws_vip_change.sh

[root@slave masterha]# cat /usr/local/bin/aws_vip_change.sh 
#!/bin/bash

if [ $# -lt 2 ]; then
        echo "Illegal param count.count = $#"
        exit 1
fi

# High Availability IP variables
# Other node's IP to ping and VIP to swap if other node goes down
VIP_ENI_ID=eni-7a5c7722

NEW_MASTER_IP=$1
OLD_MASTER_IP=$2

echo "NEW_MASTER_IP=${NEW_MASTER_IP},OLD_MASTER_IP=${OLD_MASTER_IP}"

# Specify the EC2 region that this will be running in   #以下为当前用户能够执行aws cli命令的key信息
REGION=cn-north-1
AWSAccessKeyId=AKIAPUXGZV6F6EKCOK7Q
AWSSecretKey=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       

# Run aws-apitools-common.sh to set up default environment variables and to
# leverage AWS security credentials provided by EC2 roles
. /etc/profile.d/aws-apitools-common.sh

ACCESS_USER_OPTION="--aws-access-key ${AWSAccessKeyId} --aws-secret-key ${AWSSecretKey}"

ssh -t -t root@${OLD_MASTER_IP} /sbin/ifconfig eth4 down    #其中eth4网卡名要与下文中，新申请ENI并绑定实例ID对应的网卡名相同

VIP_ATTACHMENT_ID=`ec2-describe-network-interface-attribute ${VIP_ENI_ID} ${ACCESS_USER_OPTION} --region ${REGION} -a | grep ATTACHMENT -m 1 | awk '{print $3;}'`
echo "VIP_ATTACHMENT_ID=${VIP_ATTACHMENT_ID}"

/opt/aws/bin/ec2-detach-network-interface ${VIP_ATTACHMENT_ID} ${ACCESS_USER_OPTION} --force --region ${REGION}
sleep 10

INSTANCE_ID=`ssh root@${NEW_MASTER_IP} /usr/bin/curl --silent http://169.254.169.254/latest/meta-data/instance-id`  #其中169.254.169.254从本地链路中获取实例ID
echo "INSTANCE_ID=${INSTANCE_ID}"

/opt/aws/bin/ec2-attach-network-interface ${VIP_ENI_ID} -i ${INSTANCE_ID} -d 1 ${ACCESS_USER_OPTION} --region ${REGION}
[root@slave masterha]#

master_ip_failover

[root@slave masterha]# cat /usr/local/bin/master_ip_failover 
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    my $exit_code = 10;
    my @vip_change_cmd = ("/usr/local/bin/aws_vip_change.sh",$new_master_host,$orig_master_host);
    system @vip_change_cmd;
    $exit_code = 0;
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@slave masterha]#

send_report

[root@slave masterha]# cat /usr/local/bin/send_report 
#!/usr/bin/perl
 
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
 
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.126.com';
my $mail_from='xxx@126.com';
my $mail_user='xxx@126.com';
my $mail_pass='xxx';
#my $mail_to=['xxx','xxx'];
my $mail_to='xxx';
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);
 
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
 
sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /var/log/masterha/app1/manager.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };
 
    $sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}
 
# Do whatever you want here
 
exit 0;

主库手动绑定ENI和实例关系

在主库上先申请新的ENI，注意申请时先绑定的group，以满足其他EC2能够连接些ENI。
申请成功后，根据新的ENI和当前主实例的实例ID绑定，最终确认新的VIP是否绑定上。


[root@master ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:E6:0B:68:5C:1C  
          inet addr:172.31.13.126  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::e6:bff:fe68:5c1c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:4063 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3292 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:691839 (675.6 KiB)  TX bytes:390499 (381.3 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:79 errors:0 dropped:0 overruns:0 frame:0
          TX packets:79 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:16736 (16.3 KiB)  TX bytes:16736 (16.3 KiB)

[root@master ~]# aws ec2 create-network-interface --subnet-id subnet-33859151 --private-ip-address 172.31.0.200 --groups sg-e50a1c87 sg-6b1f0809
{
    "NetworkInterface": {
        "Status": "pending", 
        "MacAddress": "02:12:25:83:a0:22", 
        "SourceDestCheck": true, 
        "VpcId": "vpc-b18195d3", 
        "Description": "", 
        "NetworkInterfaceId": "eni-7a5c7722",   #新申请的ENI名字
        "PrivateIpAddresses": [
            {
                "PrivateDnsName": "ip-172-31-0-200.cn-north-1.compute.internal", 
                "Primary": true, 
                "PrivateIpAddress": "172.31.0.200"  #新ENI绑定的私有IP
            }
        ], 
        "RequesterManaged": false, 
        "Groups": [
            {
                "GroupName": "For Mysql", 
                "GroupId": "sg-e50a1c87"
            }, 
            {
                "GroupName": "OnlySSH-Allow", 
                "GroupId": "sg-6b1f0809"
            }
        ], 
        "PrivateDnsName": "ip-172-31-0-200.cn-north-1.compute.internal", 
        "AvailabilityZone": "cn-north-1a", 
        "RequesterId": "AIDAPVDKK6NLF6SZ553KS", 
        "SubnetId": "subnet-33859151", 
        "OwnerId": "981100955930", 
        "TagSet": [], 
        "PrivateIpAddress": "172.31.0.200"
    }
}
[root@master ~]# aws ec2 attach-network-interface --network-interface-id eni-7a5c7722 --instance-id i-0a259e4950b0b3cf9 --device-index 1    #将ENI和实例绑定
{
    "AttachmentId": "eni-attach-2b2c9345"
}
}
[root@master ~]# ifconfig   #查看ip：172.31.0.200已绑定成功，同时要在其他EC2确定此安全组是否能够满足需求，否则还需要重新调整。
eth0      Link encap:Ethernet  HWaddr 02:E6:0B:68:5C:1C  
          inet addr:172.31.13.126  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::e6:bff:fe68:5c1c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:5435 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4190 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:810999 (791.9 KiB)  TX bytes:501986 (490.2 KiB)

eth4      Link encap:Ethernet  HWaddr 02:12:25:83:A0:22  
          inet addr:172.31.0.200  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::12:25ff:fe83:a022/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:11 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1218 (1.1 KiB)  TX bytes:2664 (2.6 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:79 errors:0 dropped:0 overruns:0 frame:0
          TX packets:79 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:16736 (16.3 KiB)  TX bytes:16736 (16.3 KiB)

[root@master ~]#

从库配置MHAManager

创建配置目录/etc/masterha/，同时创建一个项目上的配置文件，仅配置自动FailOver部分，只需要master_ip_failover和report_script脚本，其他脚本暂时不定义。

[root@slave ~]# mkdir -p /etc/masterha/
[root@slave ~]# cd /etc/masterha/
[root@slave masterha]# 
[root@slave masterha]# cat app1.cnf 
[server default]
manager_workdir=/var/log/masterha/app1                              #manager工作目录
manager_log=/var/log/masterha/app1/manager.log                      #manager日志
master_binlog_dir=/hwdata/data/percona                              #mysql数据目录
password=mha123456                                                  #mha连接密码
user=mha01                                                          #mha连接用户
ping_interval=1                                                     #监控主库，发送ping包的时间间隔，默认是3秒，尝试三次没有回应的时候自动进行failover
remote_workdir=/tmp                                                 #远端mysql在发生切换时binlog的保存位置
repl_password=slave123456                                           #复制密码
repl_user=slave01                                                   #复制用户
ssh_user=root                                                       #ssh用户
master_ip_failover_script=/usr/local/bin/master_ip_failover         #自动failover切换脚本
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change      #手动failover切换脚本
report_script=/usr/local/bin/send_report                            #报告发送脚本(不是必须)
#shutdown_script=                                                   #故障发生后关闭主机的脚本(不是必须)
secondary_check_script = masterha_secondary_check -s 172.31.13.126 -s 172.31.9.182 --user=root hostname=172.31.13.126  --master_ip=172.31.13.126 --master_port=3306 #通过第三方机器确认目标主库是否存活，这里的ip可换成其他机器，用于检测

[server1]
hostname=172.31.13.126                                              #主库地址
port=3306

[server2]
hostname=172.31.9.182                                               #从库地址
port=3306
candidate_master=1                                                  #候选master

测试MHAManager SSH

[root@slave masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Mon Jul 24 13:58:04 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 24 13:58:04 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jul 24 13:58:04 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jul 24 13:58:04 2017 - [info] Starting SSH connection tests..
Mon Jul 24 13:58:04 2017 - [debug] 
Mon Jul 24 13:58:04 2017 - [debug]  Connecting via SSH from root@172.31.13.126(172.31.13.126:22) to root@172.31.9.182(172.31.9.182:22)..
Mon Jul 24 13:58:04 2017 - [debug]   ok.
Mon Jul 24 13:58:05 2017 - [debug] 
Mon Jul 24 13:58:04 2017 - [debug]  Connecting via SSH from root@172.31.9.182(172.31.9.182:22) to root@172.31.13.126(172.31.13.126:22)..
Mon Jul 24 13:58:04 2017 - [debug]   ok.
Mon Jul 24 13:58:05 2017 - [info] All SSH connection tests passed successfully.
[root@slave masterha]#

测试MHAManager 复制

[root@slave masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Mon Jul 24 13:58:17 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 24 13:58:17 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jul 24 13:58:17 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jul 24 13:58:17 2017 - [info] MHA::MasterMonitor version 0.57.
Mon Jul 24 13:58:18 2017 - [info] Multi-master configuration is detected. Current primary(writable) master is 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 13:58:18 2017 - [info] Master configurations are as below: 
Master 172.31.13.126(172.31.13.126:3306), replicating from 172.31.9.182(172.31.9.182:3306)
Master 172.31.9.182(172.31.9.182:3306), replicating from 172.31.13.126(172.31.13.126:3306), read-only

Mon Jul 24 13:58:18 2017 - [info] GTID failover mode = 1
Mon Jul 24 13:58:18 2017 - [info] Dead Servers:
Mon Jul 24 13:58:18 2017 - [info] Alive Servers:
Mon Jul 24 13:58:18 2017 - [info]   172.31.13.126(172.31.13.126:3306)
Mon Jul 24 13:58:18 2017 - [info]   172.31.9.182(172.31.9.182:3306)
Mon Jul 24 13:58:18 2017 - [info] Alive Slaves:
Mon Jul 24 13:58:18 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 13:58:18 2017 - [info]     GTID ON
Mon Jul 24 13:58:18 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 13:58:18 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 13:58:18 2017 - [info] Current Alive Master: 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 13:58:18 2017 - [info] Checking slave configurations..
Mon Jul 24 13:58:18 2017 - [info] Checking replication filtering settings..
Mon Jul 24 13:58:18 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Jul 24 13:58:18 2017 - [info]  Replication filtering check ok.
Mon Jul 24 13:58:18 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jul 24 13:58:18 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jul 24 13:58:18 2017 - [info] HealthCheck: SSH to 172.31.13.126 is reachable.
Mon Jul 24 13:58:18 2017 - [info] 
172.31.13.126(172.31.13.126:3306) (current master)
 +--172.31.9.182(172.31.9.182:3306)

Mon Jul 24 13:58:18 2017 - [info] Checking replication health on 172.31.9.182..
Mon Jul 24 13:58:18 2017 - [info]  ok.
Mon Jul 24 13:58:18 2017 - [info] Checking master_ip_failover_script status:
Mon Jul 24 13:58:18 2017 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.31.13.126 --orig_master_ip=172.31.13.126 --orig_master_port=3306 
Mon Jul 24 13:58:18 2017 - [info]  OK.
Mon Jul 24 13:58:18 2017 - [warning] shutdown_script is not defined.
Mon Jul 24 13:58:18 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
[root@slave masterha]#

启动MHAManager

启动进程

[root@slave masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
[1] 4439
[root@slave masterha]# nohup: ignoring input and appending output to ‘nohup.out’

[root@slave masterha]#

查看启动日志

[root@slave ~]# cat /var/log/masterha/app1/manager.log 
Mon Jul 24 14:04:10 2017 - [info] MHA::MasterMonitor version 0.57.  #检查版本
Mon Jul 24 14:04:11 2017 - [info] Multi-master configuration is detected. Current primary(writable) master is 172.31.13.126(172.31.13.126:3306) #获取当前主节点ip
Mon Jul 24 14:04:11 2017 - [info] Master configurations are as below:   #获取复制结构
Master 172.31.13.126(172.31.13.126:3306), replicating from 172.31.9.182(172.31.9.182:3306)
Master 172.31.9.182(172.31.9.182:3306), replicating from 172.31.13.126(172.31.13.126:3306), read-only

Mon Jul 24 14:04:11 2017 - [info] GTID failover mode = 1    #GTID模式
Mon Jul 24 14:04:11 2017 - [info] Dead Servers:
Mon Jul 24 14:04:11 2017 - [info] Alive Servers:            #当前在线实例列表
Mon Jul 24 14:04:11 2017 - [info]   172.31.13.126(172.31.13.126:3306)
Mon Jul 24 14:04:11 2017 - [info]   172.31.9.182(172.31.9.182:3306)
Mon Jul 24 14:04:11 2017 - [info] Alive Slaves:             #当前在线slave列表及版本信息
Mon Jul 24 14:04:11 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 14:04:11 2017 - [info]     GTID ON
Mon Jul 24 14:04:11 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 14:04:11 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 14:04:11 2017 - [info] Current Alive Master: 172.31.13.126(172.31.13.126:3306)   #当前主实例信息
Mon Jul 24 14:04:11 2017 - [info] Checking slave configurations..
Mon Jul 24 14:04:11 2017 - [info] Checking replication filtering settings..
Mon Jul 24 14:04:11 2017 - [info]  binlog_do_db= , binlog_ignore_db=                        #复制过滤
Mon Jul 24 14:04:11 2017 - [info]  Replication filtering check ok.
Mon Jul 24 14:04:11 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jul 24 14:04:11 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jul 24 14:04:11 2017 - [info] HealthCheck: SSH to 172.31.13.126 is reachable.           #ssh可用
Mon Jul 24 14:04:11 2017 - [info] 
172.31.13.126(172.31.13.126:3306) (current master)                                          #再次确认当前复制架构
 +--172.31.9.182(172.31.9.182:3306)

Mon Jul 24 14:04:11 2017 - [info] Checking master_ip_failover_script status:
Mon Jul 24 14:04:11 2017 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.31.13.126 --orig_master_ip=172.31.13.126 --orig_master_port=3306 
Mon Jul 24 14:04:11 2017 - [info]  OK.
Mon Jul 24 14:04:11 2017 - [warning] shutdown_script is not defined.                        #shutdown_script脚本未定义    
Mon Jul 24 14:04:11 2017 - [info] Set master ping interval 1 seconds.
Mon Jul 24 14:04:11 2017 - [info] Set secondary check script: masterha_secondary_check -s 172.31.13.126 -s 172.31.9.182 --user=root hostname=172.31.13.126  --master_ip=172.31.13.126 --master_port=3306                                                #第三方检查登录
Mon Jul 24 14:04:11 2017 - [info] Starting ping health check on 172.31.13.126(172.31.13.126:3306)..
Mon Jul 24 14:04:11 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. #启动成功，等待failover
[root@slave masterha]# ps aux|grep master
root      4439  0.0  0.2 287004 23636 pts/0    S    14:04   0:00 perl /usr/local/bin/masterha_manager --conf=/etc/masterha/app1.cnf #进程存在
root      5008  0.0  0.0 110456  2092 pts/0    S+   14:13   0:00 grep --color=auto master
[root@slave masterha]#

模拟主库故障

确认当前VIP状态

在主库上执行停止服务，观察MHAManager日志，并确认VIP是否切换到从库上

[root@master ~]# ifconfig       #确认当前VIP：172.31.0.200在主库上
eth0      Link encap:Ethernet  HWaddr 02:E6:0B:68:5C:1C  
          inet addr:172.31.13.126  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::e6:bff:fe68:5c1c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:9629 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8119 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1202679 (1.1 MiB)  TX bytes:1113439 (1.0 MiB)

eth4      Link encap:Ethernet  HWaddr 02:12:25:83:A0:22  
          inet addr:172.31.0.200  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::12:25ff:fe83:a022/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:562 (562.0 b)  TX bytes:600 (600.0 b)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:90 errors:0 dropped:0 overruns:0 frame:0
          TX packets:90 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:17324 (16.9 KiB)  TX bytes:17324 (16.9 KiB)

[root@master ~]#

确认MHAManger进程状态

[root@slave masterha]# ps aux|grep master
root     11333  0.4  0.2 287004 23604 pts/0    S    15:36   0:00 perl /usr/local/bin/masterha_manager --conf=/etc/masterha/app1.cnf
root     11377  0.0  0.0 110456  2252 pts/0    S+   15:37   0:00 grep --color=auto master
[root@slave masterha]# cd /var/log/masterha/app1/
[root@slave app1]# ll
total 8
-rw-r--r-- 1 root root   36 Jul 24 15:37 app1.master_status.health
-rw-r--r-- 1 root root 2833 Jul 24 15:36 manager.log
[root@slave app1]# cat manager.log |tail -2
Mon Jul 24 15:36:54 2017 - [info] Starting ping health check on 172.31.13.126(172.31.13.126:3306)..
Mon Jul 24 15:36:54 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..     #正在等待主库FailOver
[root@slave app1]# ifconfig     #当前从库没有存在VIP：172.31.0.200
eth0      Link encap:Ethernet  HWaddr 02:0E:FB:F1:E9:4E  
          inet addr:172.31.9.182  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::e:fbff:fef1:e94e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:16565 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17269 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2084631 (1.9 MiB)  TX bytes:2304176 (2.1 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:2556 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2556 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:605471 (591.2 KiB)  TX bytes:605471 (591.2 KiB)

[root@slave app1]#

停止主库服务，触发主库FailOver

[root@master ~]# /etc/init.d/mysqld stop                    #服务停止
Shutting down MySQL (Percona Server)............ SUCCESS! 
[root@master ~]# ifconfig                                   #VIP已经不存在
eth0      Link encap:Ethernet  HWaddr 02:E6:0B:68:5C:1C  
          inet addr:172.31.13.126  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::e6:bff:fe68:5c1c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:10242 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8535 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1256340 (1.1 MiB)  TX bytes:1175340 (1.1 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:92 errors:0 dropped:0 overruns:0 frame:0
          TX packets:92 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:17424 (17.0 KiB)  TX bytes:17424 (17.0 KiB)

[root@master ~]#

观察FailOver日志

登录从库，查看MHAManager日志

[root@slave app1]# cat manager.log 
Mon Jul 24 15:36:52 2017 - [info] MHA::MasterMonitor version 0.57.
Mon Jul 24 15:36:54 2017 - [info] Multi-master configuration is detected. Current primary(writable) master is 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:36:54 2017 - [info] Master configurations are as below: 
Master 172.31.13.126(172.31.13.126:3306), replicating from 172.31.9.182(172.31.9.182:3306)
Master 172.31.9.182(172.31.9.182:3306), replicating from 172.31.13.126(172.31.13.126:3306), read-only

Mon Jul 24 15:36:54 2017 - [info] GTID failover mode = 1
Mon Jul 24 15:36:54 2017 - [info] Dead Servers:
Mon Jul 24 15:36:54 2017 - [info] Alive Servers:
Mon Jul 24 15:36:54 2017 - [info]   172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:36:54 2017 - [info]   172.31.9.182(172.31.9.182:3306)
Mon Jul 24 15:36:54 2017 - [info] Alive Slaves:
Mon Jul 24 15:36:54 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 15:36:54 2017 - [info]     GTID ON
Mon Jul 24 15:36:54 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:36:54 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 15:36:54 2017 - [info] Current Alive Master: 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:36:54 2017 - [info] Checking slave configurations..
Mon Jul 24 15:36:54 2017 - [info] Checking replication filtering settings..
Mon Jul 24 15:36:54 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Jul 24 15:36:54 2017 - [info]  Replication filtering check ok.
Mon Jul 24 15:36:54 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jul 24 15:36:54 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jul 24 15:36:54 2017 - [info] HealthCheck: SSH to 172.31.13.126 is reachable.
Mon Jul 24 15:36:54 2017 - [info] 
172.31.13.126(172.31.13.126:3306) (current master)
 +--172.31.9.182(172.31.9.182:3306)

Mon Jul 24 15:36:54 2017 - [info] Checking master_ip_failover_script status:
Mon Jul 24 15:36:54 2017 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.31.13.126 --orig_master_ip=172.31.13.126 --orig_master_port=3306 
Mon Jul 24 15:36:54 2017 - [info]  OK.
Mon Jul 24 15:36:54 2017 - [warning] shutdown_script is not defined.
Mon Jul 24 15:36:54 2017 - [info] Set master ping interval 1 seconds.
Mon Jul 24 15:36:54 2017 - [info] Set secondary check script: masterha_secondary_check -s 172.31.13.126 -s 172.31.9.182 --user=root hostname=172.31.13.126  --master_ip=172.31.13.126 --master_port=3306
Mon Jul 24 15:36:54 2017 - [info] Starting ping health check on 172.31.13.126(172.31.13.126:3306)..
Mon Jul 24 15:36:54 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..         #等待主库FailOver
Mon Jul 24 15:40:01 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon Jul 24 15:40:01 2017 - [info] Executing secondary network check script: masterha_secondary_check -s 172.31.13.126 -s 172.31.9.182 --user=root hostname=172.31.13.126  --master_ip=172.31.13.126 --master_port=3306  --user=root  --master_host=172.31.13.126  --master_ip=172.31.13.126  --master_port=3306 --master_user=mha01 --master_password=mha123456 --ping_type=SELECT
Mon Jul 24 15:40:01 2017 - [info] Executing SSH check script: exit 0
Mon Jul 24 15:40:01 2017 - [info] HealthCheck: SSH to 172.31.13.126 is reachable.
Monitoring server 172.31.13.126 is reachable, Master is not reachable from 172.31.13.126. OK.
Monitoring server 172.31.9.182 is reachable, Master is not reachable from 172.31.9.182. OK.
Mon Jul 24 15:40:01 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Mon Jul 24 15:40:02 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.13.126' (111))
Mon Jul 24 15:40:02 2017 - [warning] Connection failed 2 time(s)..
Mon Jul 24 15:40:03 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.13.126' (111))
Mon Jul 24 15:40:03 2017 - [warning] Connection failed 3 time(s)..
Mon Jul 24 15:40:04 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.13.126' (111))
Mon Jul 24 15:40:04 2017 - [warning] Connection failed 4 time(s)..
Mon Jul 24 15:40:04 2017 - [warning] Master is not reachable from health checker!
Mon Jul 24 15:40:04 2017 - [warning] Master 172.31.13.126(172.31.13.126:3306) is not reachable!
Mon Jul 24 15:40:04 2017 - [warning] SSH is reachable.
Mon Jul 24 15:40:04 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Mon Jul 24 15:40:04 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 24 15:40:04 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jul 24 15:40:04 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jul 24 15:40:05 2017 - [info] GTID failover mode = 1
Mon Jul 24 15:40:05 2017 - [info] Dead Servers:
Mon Jul 24 15:40:05 2017 - [info]   172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:40:05 2017 - [info] Alive Servers:
Mon Jul 24 15:40:05 2017 - [info]   172.31.9.182(172.31.9.182:3306)
Mon Jul 24 15:40:05 2017 - [info] Alive Slaves:
Mon Jul 24 15:40:05 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 15:40:05 2017 - [info]     GTID ON
Mon Jul 24 15:40:05 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:40:05 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 15:40:05 2017 - [info] Checking slave configurations..
Mon Jul 24 15:40:05 2017 - [info] Checking replication filtering settings..
Mon Jul 24 15:40:05 2017 - [info]  Replication filtering check ok.
Mon Jul 24 15:40:05 2017 - [info] Master is down!
Mon Jul 24 15:40:05 2017 - [info] Terminating monitoring script.
Mon Jul 24 15:40:05 2017 - [info] Got exit code 20 (Master dead).
Mon Jul 24 15:40:05 2017 - [info] MHA::MasterFailover version 0.57.
Mon Jul 24 15:40:05 2017 - [info] Starting master failover.
Mon Jul 24 15:40:05 2017 - [info] 
Mon Jul 24 15:40:05 2017 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 24 15:40:05 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] GTID failover mode = 1
Mon Jul 24 15:40:07 2017 - [info] Dead Servers:
Mon Jul 24 15:40:07 2017 - [info]   172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:40:07 2017 - [info] Checking master reachability via MySQL(double check)...
Mon Jul 24 15:40:07 2017 - [info]  ok.
Mon Jul 24 15:40:07 2017 - [info] Alive Servers:
Mon Jul 24 15:40:07 2017 - [info]   172.31.9.182(172.31.9.182:3306)
Mon Jul 24 15:40:07 2017 - [info] Alive Slaves:
Mon Jul 24 15:40:07 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 15:40:07 2017 - [info]     GTID ON
Mon Jul 24 15:40:07 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:40:07 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 15:40:07 2017 - [info] Starting GTID based failover.
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Jul 24 15:40:07 2017 - [info] Executing master IP deactivation script:
Mon Jul 24 15:40:07 2017 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=172.31.13.126 --orig_master_ip=172.31.13.126 --orig_master_port=3306 --command=stopssh --ssh_user=root  
Mon Jul 24 15:40:07 2017 - [info]  done.
Mon Jul 24 15:40:07 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Jul 24 15:40:07 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] * Phase 3: Master Recovery Phase..
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000011:234
Mon Jul 24 15:40:07 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Jul 24 15:40:07 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 15:40:07 2017 - [info]     GTID ON
Mon Jul 24 15:40:07 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:40:07 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 15:40:07 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000011:234
Mon Jul 24 15:40:07 2017 - [info] Oldest slaves:
Mon Jul 24 15:40:07 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 15:40:07 2017 - [info]     GTID ON
Mon Jul 24 15:40:07 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:40:07 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] * Phase 3.3: Determining New Master Phase..
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] Searching new master from slaves..
Mon Jul 24 15:40:07 2017 - [info]  Candidate masters from the configuration file:
Mon Jul 24 15:40:07 2017 - [info]   172.31.9.182(172.31.9.182:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 15:40:07 2017 - [info]     GTID ON
Mon Jul 24 15:40:07 2017 - [info]     Replicating from 172.31.13.126(172.31.13.126:3306)
Mon Jul 24 15:40:07 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 15:40:07 2017 - [info]  Non-candidate masters:
Mon Jul 24 15:40:07 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Mon Jul 24 15:40:07 2017 - [info] New master is 172.31.9.182(172.31.9.182:3306)
Mon Jul 24 15:40:07 2017 - [info] Starting master failover..
Mon Jul 24 15:40:07 2017 - [info] 
From:
172.31.13.126(172.31.13.126:3306) (current master)
 +--172.31.9.182(172.31.9.182:3306)

To:
172.31.9.182(172.31.9.182:3306) (new master)
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info] * Phase 3.3: New Master Recovery Phase..
Mon Jul 24 15:40:07 2017 - [info] 
Mon Jul 24 15:40:07 2017 - [info]  Waiting all logs to be applied.. 
Mon Jul 24 15:40:07 2017 - [info]   done.
Mon Jul 24 15:40:07 2017 - [info] Getting new master's binlog name and position..
Mon Jul 24 15:40:07 2017 - [info]  mysql-bin.000008:234
Mon Jul 24 15:40:07 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.31.9.182', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='slave01', MASTER_PASSWORD='xxx';
Mon Jul 24 15:40:07 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 234, 28a037e5-6d0f-11e7-b43a-02e60b685c1c:1-15,
39292a50-6d0f-11e7-999e-020efbf1e94e:1-4
Mon Jul 24 15:40:07 2017 - [info] Executing master IP activate script:
Mon Jul 24 15:40:07 2017 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.31.13.126 --orig_master_ip=172.31.13.126 --orig_master_port=3306 --new_master_host=172.31.9.182 --new_master_ip=172.31.9.182 --new_master_port=3306 --new_master_user='mha01'   --new_master_password=xxx
NEW_MASTER_IP=172.31.9.182,OLD_MASTER_IP=172.31.13.126
Connection to 172.31.13.126 closed.
VIP_ATTACHMENT_ID=eni-attach-5b219e35
ATTACHMENT              eni-attach-5b219e35     detaching
INSTANCE_ID=i-03135a59c242e9bad
eni-attach-0f209f61
Mon Jul 24 15:40:22 2017 - [info]  OK.
Mon Jul 24 15:40:22 2017 - [info] Setting read_only=0 on 172.31.9.182(172.31.9.182:3306)..
Mon Jul 24 15:40:22 2017 - [info]  ok.
Mon Jul 24 15:40:22 2017 - [info] ** Finished master recovery successfully.
Mon Jul 24 15:40:22 2017 - [info] * Phase 3: Master Recovery Phase completed.
Mon Jul 24 15:40:22 2017 - [info] 
Mon Jul 24 15:40:22 2017 - [info] * Phase 4: Slaves Recovery Phase..
Mon Jul 24 15:40:22 2017 - [info] 
Mon Jul 24 15:40:22 2017 - [info] 
Mon Jul 24 15:40:22 2017 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Jul 24 15:40:22 2017 - [info] 
Mon Jul 24 15:40:22 2017 - [info] All new slave servers recovered successfully.
Mon Jul 24 15:40:22 2017 - [info] 
Mon Jul 24 15:40:22 2017 - [info] * Phase 5: New master cleanup phase..
Mon Jul 24 15:40:22 2017 - [info] 
Mon Jul 24 15:40:22 2017 - [info] Resetting slave info on the new master..
Mon Jul 24 15:40:22 2017 - [info]  172.31.9.182: Resetting slave info succeeded.
Mon Jul 24 15:40:22 2017 - [info] Master failover to 172.31.9.182(172.31.9.182:3306) completed successfully.
Mon Jul 24 15:40:22 2017 - [info] 

----- Failover Report -----

app1: MySQL Master failover 172.31.13.126(172.31.13.126:3306) to 172.31.9.182(172.31.9.182:3306) succeeded

Master 172.31.13.126(172.31.13.126:3306) is down!

Check MHA Manager logs at slave:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.31.13.126(172.31.13.126:3306)
Selected 172.31.9.182(172.31.9.182:3306) as a new master.
172.31.9.182(172.31.9.182:3306): OK: Applying all logs succeeded.
172.31.9.182(172.31.9.182:3306): OK: Activated master IP address.
172.31.9.182(172.31.9.182:3306): Resetting slave info succeeded.
Master failover to 172.31.9.182(172.31.9.182:3306) completed successfully.
Mon Jul 24 16:27:13 2017 - [info] Sending mail..
defined(@array) is deprecated at /usr/share/perl5/vendor_perl/Mail/Sender.pm line 318.
        (Maybe you should just omit the defined()?)
defined(@array) is deprecated at /usr/share/perl5/vendor_perl/Mail/Sender.pm line 2693.
        (Maybe you should just omit the defined()?)
Option new_slave_hosts requires an argument
Unknown option: conf
tail: manager.log: file truncated
>> 220 126.com Anti-spam GT for Coremail System (126com[20140526])
<< EHLO slave
>> 250-mail
>> 250-PIPELINING
>> 250-AUTH LOGIN PLAIN
>> 250-AUTH=LOGIN PLAIN
>> 250-coremail 1Uxr2xKj7kG0xkI17xGrU7I0s8FY2U3Uj8Cz28x1UUUUU7Ic2I0Y2Ur0Vl_3UCa0xDrUUUUj
>> 250-STARTTLS
>> 250 8BITMIME
<< AUTH LOGIN
>> 334 dXNlcm5hbWU6
<< ZHdqNTY4NTg4QDEyNi5jb20=
>> 334 UGFzc3dvcmQ6
<< RHdqMTIzNDU=
>> 235 Authentication successful
<< MAIL FROM:<dwj568588@126.com>
>> 250 Mail OK
<< RCPT TO:<duanwenjie@huan.tv>
>> 250 Mail OK
<< DATA
>> 354 End data with <CR><LF>.<CR><LF>
<< To: duanwenjie@huan.tv
<< From: dwj568588@126.com
<< Subject: app1: MySQL Master failover 172.31.13.126(172.31.13.126:3306) to 172.31.9.182(172.31.9.182:3306) succeeded
<< Date: Mon, 24 Jul 2017 16:27:15 +0800
<< X-Mailer: Perl script "send_report"
<<      using Mail::Sender 0.8.16 by Jenda Krynicky, Czechlands
<<      running on slave ()
<<      under account "zhouting"
<< Message-ID: <20170724_082715_056339.dwj568588@126.com>
<< MIME-Version: 1.0
<< Content-type: text/plain; charset=utf-8
<< 
<< Master 172.31.13.126(172.31.13.126:3306) is down!
<< 
<< Check MHA Manager logs at slave:/var/log/masterha/app1/manager.log for details.
<< 
<< Started automated(non-interactive) failover.
<< Invalidated master IP address on 172.31.13.126(172.31.13.126:3306)
<< Selected 172.31.9.182(172.31.9.182:3306) as a new master.
<< 172.31.9.182(172.31.9.182:3306): OK: Applying all logs succeeded.
<< 172.31.9.182(172.31.9.182:3306): OK: Activated master IP address.
<< 172.31.9.182(172.31.9.182:3306): Resetting slave info succeeded.
<< Master failover to 172.31.9.182(172.31.9.182:3306) completed successfully.
<< 
<< .
>> 250 Mail OK queued as smtp1,C8mowAAH7yZhr3VZFR+JCA--.54250S2 1500884834
<< QUIT
>> 221 Bye
[root@slave app1]#

日志过程解析

启动前的准备工作
检查数据库服务器状态,获取相关参数设置
检查GTID、candidate_master、过滤DB是否设置
测试ssh连接是否成功
测试MHA node是否可用
创建MHA日志目录
开始检查slave的差异日志应用权限
确定当前的复制架构
调试master_ip_failover_script
调试shutdown_script
设置二次检查的主机masterha_secondary_check
MHA启动完毕,进入监测状态
监测master服务器挂了
通过定义的二次监测,确认master是否挂了
确认master挂了,开始进入failover流程
再试尝试连接master和master的ssh
通过MHA配置文件,监测其他slave的状态
再次监测slave的配置是否有变化,是否符合failover条件
正式开始failover
再次对slave配置做检查
对原Master做master_ip_failover_script和shutdown_script的操作
开始差异日志的恢复，获取slave最后得到的binlog位置
获取原master的binlog日志
确定新的master
在new master上应用差异的binlog日志
获取new master的binlog位置。
执行master_ip_failover_script,调用aws_vip_change.sh，执行VIP漂移
开始恢复其他slave的差异日志
差异日志应用完成以后,切换所有slave到new master。
failover操作完成,生成failover报告
最后发送邮件通知

确认FailOver状态

[root@slave app1]# ifconfig             #vip已经添加到从库上
eth0      Link encap:Ethernet  HWaddr 02:0E:FB:F1:E9:4E  
          inet addr:172.31.9.182  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::e:fbff:fef1:e94e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:17060 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17883 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2142676 (2.0 MiB)  TX bytes:2390595 (2.2 MiB)

eth1      Link encap:Ethernet  HWaddr 02:12:25:83:A0:22  
          inet addr:172.31.0.200  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::12:25ff:fe83:a022/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:18 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:798 (798.0 b)  TX bytes:1828 (1.7 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:2789 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2789 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:669913 (654.2 KiB)  TX bytes:669913 (654.2 KiB)

[root@slave app1]#

查看邮件发送状态

登录邮件，查看是否收到报警邮件。

手动FailOver

MHA的手动FailOver不在本文范围，详情理论可参考官方文档。

参数列表

参数名	是否必须	参数使用域	默认值	例子
hostname	Yes	Local Only	-	hostname=master,hostname=slave
ip	No	Local Only	gethostbyname($hostname)	ip=192.168.1.100
port	No	Local/App/Global	3306	port=3306
ssh_host	No	Local Only	same as hostname	ssh_host=master,ssh_host=slave
ssh_ip	No	Local Only	gethostbyname($ssh_host)	ssh_ip=192.168.1.101
ssh_port	No	Local/App/Global	22	ssh_port=22
ssh_connection_timeout	No	Local/App/Global	5	ssh_connection_timeout=20
ssh_options	No	Local/App/Global	“”(empty string)	ssh_options=”-i /root/.ssh/id_dsa2”
candidate_master	No	Local Only	0	candidate_master=1
no_master	No	Local Only	0	no_master=1
ignore_fail	No	Local Only	0	ignore_fail=1
skip_init_ssh_check	No	Local Only	0	skip_init_ssh_check=1
skip_reset_slave	No	Local/App/Global	0	skip_reset_slave=1
user	No	Local/App/Global	root	user=root
password	No	Local/App/Global	“”(empty string)	password=rootpass
repl_user	No	Local/App/Global	Master_User value from SHOW SLAVE STATUS	repl_user=repl
repl_password	No	Local/App/Global	(current replication password)	repl_user=replpass
disable_log_bin	No	Local/App/Global	0	disable_log_bin=1
master_pid_file	No	Local/App/Global	“”(empty string)	master_pid_file=/var/lib/mysql/master1.pid
ssh_user	No	Local/App/Global	current OS user	ssh_user=root
remote_workdir	No	Local/App/Global	/var/tmp	remote_workdir=/var/log/masterha/app1
master_binlog_dir	No	Local/App/Global	/var/lib/mysql	master_binlog_dir=/data/mysql1
log_level	No	App/Global	info	log_level=debug
manager_workdir	No	App	/var/tmp	manager_workdir=/var/log/masterha
client_bindir	No	App	-	client_bindir=/usr/mysql/bin
client_libdir	No	App	-	client_libdir=/usr/lib/mysql
manager_log	No	App	STDERR	manager_log=/var/log/masterha/app1.log
check_repl_delay	No	App/Global	1	check_repl_delay=0
check_repl_filter	No	App/Global	1	check_repl_filter=0
latest_priority	No	App/Global	1	latest_priority=0
multi_tier_slave	No	App/Global	0	multi_tier_slave=1
ping_interval	No	App/Global	3	ping_interval=5
ping_type	No	App/Global	SELECT	ping_type=CONNECT
secondary_check_script	No	App/Global	null	secondary_check_script= masterha_secondary_check -s remote_dc1 -s remote_dc2
master_ip_failover_script	No	App/Global	null	master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script	No	App/Global	null	master_ip_online_change_script= /usr/local/bin/master_ip_online_change
shutdown_script	No	App/Global	null	shutdown_script= /usr/local/bin/master_shutdown
report_script	No	App/Global	null	report_script= /usr/local/bin/report

详情解释

Local Scope:        针对每个server级别有效的选项.local scope级别的参数需要在配置文件的 [server_xxx]段落配置  
App Scope:          这个参数可以理解为针对一组master-slave集群. 这些参数需要在 [server_default]段落配置  
Global Scope:       这个参数针对所有的MHA管理的实例. global scope级别的配置只有你在使用一个manager server管理多组master-slave时使用 
hostname：          目标Mysql服务器的主机名或者IP地址,这个参数是强制的,并且必须配置在[server_xxx]段落
ip：                目标Mysql服务器的IP地址,默认是通过$hostname变量获取.MHA manager和MHA Node
                    内部使用这个IP地址链接到Mysql和SSH.通常你不需要配置这个参数,因为它可以通过$hostname解析获得
port：              mysql服务器使用的端口号,默认是3306,MHA链接到mysql通过IP地址和端口
ssh_host：          从0.53版本开始支持此参数,此参数的主要目的在于你mysql复制使用的IP地址由于安全策略,不能直接连接,
                    所以需要通过其他IP和端口连接到mysql server或者ssh,默认这个参数和hostname的值相同
ssh_ip：            从0.53版本开始支持此参数,连接到目标mysql server的ssh ip地址,默认从$ssh_host获取
ssh_port：          从0.53版本开始支持此参数,连接到目标mysql server的ssh端口,默认是22
ssh_connection_timeout：从0.54开始支持此参数,默认是5秒,添加此参数的原因是以前我们在程序里面写死了
ssh_options：       从0.53开始支持此参数,可以添加SSH命令行的参数
candidate_master：  
你的slave的硬件配置或者用途可能不太一样,你想要提升更加可靠的slave作为新的master
如果设置candidate_master的值为1,那么这个server会优先成为master,只要它满足成为master的条件(binlog开启的,没有严重的复制延时等)
所以意味着配置了candiate_master=1的服务器并不是肯定可以成为new master,但是这个参数能够帮助提高优先级
如果你在多个服务器上设置了candidate_master=1,那么在配置文件中[server_xxx]的配置顺序,会成为第二排序规则,排在上面的越优先
no_master：         no_master=1是默认值,意思是这个server从来不会成为新的master,这个参数用来标记某些从来不用成为new master的服务器
ignore_fail：       
默认是0,当某个slave的ssh或者mysql当掉或者复制失败的时候,MHA manager不启动failover。
但是有些环境下,你想要在某个特定的slave失败的时候继续执行failover,把么就设置这个ignore_fail=1,即时这个slave失败的时候,failover依然继续执行
skip_init_ssh_check：启动的时候跳过ssh连接测试
skip_reset_slave：  从0.56版本开始支持此参数,在master执行failover以后跳过执行reset slave all(译者感觉应该是方便MM复制的后期恢复)
user：              mysql管理员命令,建议赋予all privileges权限
password：          上面mysql user的密码
repl_user：         mysql复制使用的用户名,需要执行change master to命令和show slave status命令等.一般赋予replication slave,replication client权限
repl_password：     上面用户对应的密码.默认情况下,使用的是当前复制的密码
disable_log_bin：   当这个参数设置以后,当应用差异日志到slave的时候,slave不会生成binlog. Internally MHA passes --disable-log-bin to mysqlbinlog command
master_pid_file：   设置master的pid文件位置,当你的服务器上运行了多个mysql的时候使用
ssh_user：          
MHA manager和MHA node通过这个用户连接到mysql server的OS上,在对应的OS上执行相关命令和拷贝差异日志等等
这个用户必须有读取mysql binary/relay 相关日志文件的权限,还有日志目录的写入权限.(remote_workdir目录)
这个用户必须可以直接连接到服务器上,不用任何交互的操作.通过使用ssh public key作为认证方式.默认使用的用户是manager当前的用户
remote_workdir：    
每个MHA node上,MHA工作使用的目录的绝对路径.如果目录不存在,MHA会自动创建,如果权限不够,那么MHA node会意外终止,
注意MHA manager和MHA node都不会检查这个目录的磁盘可用空间,你需要自己保证有足够的可用空间.默认的remote_workdir是'/var/tmp'
master_binlog_dir： 
master mysql保存binlog的目录的绝对路径.如果参数的主要目的是在master mysql宕机以后,为了通过ssh拷贝需要的binlog event
这个参数是需要的,因为当mysql宕机以后,没法自动获取binary log的目录
默认情况下,master_binlog_dir的值是"/var/lib/mysql/,/var/log/mysql/",/var/lib/mysql/是大部分mysql发布版本的默认mysql输出目录,
你可以设置多个目录,使用逗号分隔.比如(/data1,/data2,/data3)
log_level：         MHA manager 的日志级别,默认是info级别,在大多数环境下没有问题.可用的级别有.debug/info/warning/error
manager_workdir：   MHA manager使用的工作目录,如果没有设置,默认使用/var/tmp
client_bindir：     如果mysql命令行工具没有安装在默认目录,可以使用这个选项配置
client_libdir：     如果mysql lib没有安装在默认目录,可以使用这个选项配置
manager_log：       MHA manager日志文件的绝对目录和文件名,如果没有设置,那么将直接打印到标准输出和标准错误输出
当执行交互式的failover时,MHA manager将会忽略manager_log设置,直接答应到标准输出和标准错误输出
check_repl_delay：  
默认情况下如果一个slave比master延时了100M的relay logs.MHA不会选择这个slave作为新的master.因为他需要更多的时间来recovery
设置check_repl_delay=0,MHA在选择new master时将会忽略复制的延时.这个参数通常和candidate_master=1同时使用
check_repl_filter： 
默认情况下如果master和slave有不同的binary log/replication 过滤规则的话,MHA打印错误,不进行start monitoring或者failover
这么做的目的是为了避免recover的时候出现意外的错误,例如"Table not exists",如果你100%确定你不同的过滤规则不会导致recover时候报错,
那么你可以设置check_repl_fiter=0,这样的话,MHA在应用差异日志的时候将不会在检查过滤规则.使用这个参数的时候需要非常小心
latest_priority：   默认情况下,接收到了最新的binlog的slave优先被选为new master,如果你想要控制优先级的顺序,比如(host2>host3>host4),那么设置latest_priority=0会有所帮助
multi_tier_slave：  
MHA manager从0.5.2开始支持多主复制.默认情况下,它不支持三层以上的复制架构.例如,host2是host1的slave,host3是host2的slave,
默认不允许把这个三个主机写到同一个配置文件中因为这是一个三层的复制架构,并且MHA会因此报错,并停止工作.如果设置了multi_tier_slave,
MHA manager不会因为多层架构而终止,它会忽略三层及以上的主机,例如master(host1)挂掉以后,host2会被选为新的master,host3将继续从host2复制
ping_interval：     
这个参数声明MHA manager pings(通过执行sql来ping) master的间隔.当连续三次ping失败,MHA manager认为这个Mysql master宕机,从宕机到检测到宕机,
最大的消耗时间是这个参数的四倍,这个参数默认值是3(3秒),如果MHA manager因为权限问题多次连接失败,这不认为master dead
ping_type：
从MHA manager 0.53开始支持这个参数, MHA建立一个长连接到master,然后通过执行"SELECT 1" (ping_type=SELECT)来判断master是否可用
但是在一些环境下,最好使用短连接的方式,因为这样的方式更严格,也能检测TCP连接级别的失败.设置ping_type=CONNECT可以实现.从0.56版本开始,支持ping_type=INSERT
secondary_check_script：
通常情况,强力推荐使用两个或者多个路由检测Mysql master默认MHA通过一个路由检测:从manager到master.这是不推荐的
MHA manager可以通过secondary_check_script参数调用一个内部脚本来实现两个或者多个路由的检测.下面是一个配置实例
secondary_check_script = masterha_secondary_check -s remote_host1 -s remote_host2
MHA manager包含masterha_secondary_check 脚本.内置的masterha_secondary_check脚本可以满足大多是环境,当然你也可以调用你自定义的脚本.
在上面的实例中,MHA manager通过这两个路由检测master主机
Manager-(A)->remote_host1-(B)->master_host
Manager-(A)->remote_host2-(B)->master_host
如果两条路径中,A都是成功的,B都是不成功的,那么masterha_secondary_check将会退出,返回状态0.MHA manager将会认为Mysql master宕机,开始failover
如果A失败,那么masterha_secondary_check将退出,返回状态2,MHA manager网络出现问题,不会开始failover
如果B检测成功,masterha_secondary_check退出并返回状态3,MHA manager明确的知道mysql master是活跃的,不开始failover
通常来讲,remote_host1和remote_host2必须和 MHA manager及Mysql server不在同一个网段,这样才更有意义
MHA调用secondary_check_script参数对应的脚本会自动传递以下参数(所以你不需要在配置文件中设置),
masterha_secondary_check适用于大多数环境,如果你需要其他的功能可以自己写一个网络检查的脚本.
--user=(SSH username of the remote hosts. ssh_user parameter value will be passed)
--master_host=(master's hostname)
--master_ip=(master's ip address)
--master_port=(master's port number)
注意:
masterha_secondary_check脚本依赖于IO::Socket::INET perl包,在perl v5.6.0中默认包含,masterha_secondary_check脚本会通过
ssh连接到所有的remote servers.所以SSH public key 认证需要设置.另外masterha_secondary_check脚本尝试使用TCP连接从remote server到mysql master.这意味着服务器设置的max_connections参数对此连接无效,并且如果TCP连接成功,Aborted_connects状态将会加1
master_ip_failover_script：
常见的HA环境下,大多是情况会给master分配一个虚拟IP,如果master宕机,HA软件像一个Pacemaker将虚拟IP转移到备用的master上
另外一种常见的方法就是创建一个全局目录数据库,包含所有应用和writer/reader ip地址.例如{app_master1,192.168.0.1},{app_master2,192.168.0.2}...,代替使用虚拟IP,
这种情况,你需要在master宕机的时候更新目录数据库
两种方法都有好的或者不好的地方,MHA不强制要求使用哪一种,但是提供了master_ip_failover_script参数来完成此目的
换句话说,你需要写一个脚本来调整应用服务连接到新的master,然后定义master_ip_failover_script的参数,下面是一个实例
master_ip_failover_script= /usr/local/bin/master_ip_failover
你可以从(MHA Manager package)/samples/scripts/master_ip_failover找到一个简单的脚本.这个脚本在manager的tarball和GitHUb branch中才包含
MHA manager会调用master_ip_failover_script三次,
第一次,在开始master monitor之前调用(目的是检查脚本是否可用),
第二次是在调用shutdown_script脚本前调用,
第三次是在new master应用完所有的差异日志以后,MHA manager会传递给脚本如下参数.(你不用在配置文件中指明这些参数)
Checking phase
--command=status
--ssh_user=(current master's ssh username)
--orig_master_host=(current master's hostname)
--orig_master_ip=(current master's ip address)
--orig_master_port=(current master's port number)
Current master shutdown phase
--command=stop or stopssh
--ssh_user=(dead master's ssh username, if reachable via ssh)
--orig_master_host=(current(dead) master's hostname)
--orig_master_ip=(current(dead) master's ip address)
--orig_master_port=(current(dead) master's port number)
New master activation phase
--command=start
--ssh_user=(new master's ssh username)
--orig_master_host=(dead master's hostname)
--orig_master_ip=(dead master's ip address)
--orig_master_port=(dead master's port number)
--new_master_host=(new master's hostname)
--new_master_ip=(new master's ip address)
--new_master_port(new master's port number)
--new_master_user=(new master's user)
--new_master_password(new master's password)
如果你使用共享虚拟IP的方法,你不需要在master shutdown阶段做任何事情,只要通过shutdown_script脚本关掉原来master的电源,在激活new master阶段,
你需要分配虚拟IP给new master.如果你使用目录数据库的方法,你需要在宕机的master shutdown阶段删除或者更新失效master的记录
在new master生效阶段需要插入或者更新new master的记录.另外你可以做任何事情来保证应用程序可以成功的写入new master
比如SET GLOBAL read_only=0，创建有写入权限的数据库用户等等
MHA manager检查脚本的退出状态,如果脚本退出状态是0或者10,MHA manager继续操作,如果脚本退出状态不是0或者10,MHA manager将会意外终止,并不继续continue failover.默认这个参数是空的
master_ip_online_change_script：
这是几个简单版本的master_ip_failover_script参数,但是master failover命令并不调用它.master online change命令会调用它.(masterha_master_switch --master_state=alive),传递以下参数
Current master write freezing phase
--command=stop or stopssh
--orig_master_host=(current master's hostname)
--orig_master_ip=(current master's ip address)
--orig_master_port=(current master's port number)
--orig_master_user=(current master's user)
--orig_master_password=(current master's password)
--orig_master_ssh_user=(from 0.56, current master's ssh user)
--orig_master_is_new_slave=(from 0.56, notifying whether the orig master will be new slave or not)
New master granting write phase
--command=start
--orig_master_host=(orig master's hostname)
--orig_master_ip=(orig master's ip address)
--orig_master_port=(orig master's port number)
--new_master_host=(new master's hostname)
--new_master_ip=(new master's ip address)
--new_master_port(new master's port number)
--new_master_user=(new master's user)
--new_master_password=(new master's password)
--new_master_ssh_user=(from 0.56, new master's ssh user)
MHA在当前的master write freezing阶段后执行FLUASH TABLES WITH READ LOCK, 在new mastergranting write阶段你可以执行一些类似master_ip_failover_script的操作
比如创建一个有写入权限的用户,执行SET GLOBAL read_only=0,更新目录数据库等.如果你的脚本退出返回状态不是1或者10,那么MHA manager将会意外终止,
停止master switch这个参数默认为空,所以MHA manager不做任何调用你可以从(MHA Manager package)/samples/scripts/master_ip_online_change找到一个简单的脚本
这个脚本在manager的tarball和GitHUb branch中才包含
shutdown_script：
你可能需要强制关闭master服务器,避免他再次提供服务,这对于避免脑裂很重要.下面是一个实例
shutdown_script= /usr/local/bin/power_manager
你可以从(MHA Manager package)/samples/scripts/power_manager找到一个简单的脚本.这个脚本在manager的tarball和GitHUb branch中才包含
在调用shutdown_script脚本之前,MHA manager内部会通过ssh尝试连接到mysql master,如果ssh可以连接(意思就是OS是存活的,但是Mysqld没有运行),MHAmanager就会传递下面的参数
--command=stopssh (这个意思就是指停止服务,不会关机)
--ssh_user=(ssh username so that you can connect to the master)
--host=(master's hostname)
--ip=(master's ip address)
--port=(master's port number)
--pid_file=(master's pid file)
如果master主机的ssh不能连接,那么MHA会使用如下参数
--command=stop (这个会通过fence设备关掉电源)
--host=(master's hostname)
--ip=(master's ip address)
这个脚本的大概功能如下,如果--command=stopssh被调用,脚本会使用killall -9 杀掉目标服务器上所有的mysqld_safe服务
如果--pid_file被设置,脚本尝试kill指定的进程.如果脚本执行成功,那么脚本会退出返回状态10.如果退出状态为10,
MHA manager后面会通过ssh连接到master,获取需要的binary log.如果脚本通过ssh连接到服务器失败,那么就会传递--command=stop参数,这个参数尝试关闭机器的电源,
关闭电源依赖于H/W.HP(ILO),DELL(DRAC).如果power off成功,脚本会然会状态0,其他情况会返回状态1.当返回状态是0的时候MHA manager 
开始failover.如果返回状态不是0或者10,那么MHA manager会意外终止.这个参数默认是空,所以MHA manager不会调用任何脚本
另外,MHA manager在启动monitoring之前调用shutdown_script.这时候会传递下面的参数.目的是检测脚本是否可用,如果发现错误,你可以提前知道
--command=status
--host=(master's hostname)
--ip=(master's ip address)
report_script：
你希望当failover发生以后可以发送一个报告(例如email),report_script可以达到这个目的,MHA manager传递下面的参数
--orig_master_host=(dead master's hostname)
--new_master_host=(new master's hostname)
--new_slave_hosts=(new slaves' hostnames, delimited by commas)
--subject=(mail subject)
--body=(body)
默认这个参数是空的,所以MHA manager不调用任何脚本，你可以从(MHA Manager package)/samples/scripts/send_report找到一个简单的脚本
这个脚本在manager的tarball和GitHUb branch中才包含
init_conf_load_script：
这个脚本可以在你不想在配置文件中写明文密码的时候使用.你可以覆盖全局配置参数.实例脚本如下
#!/usr/bin/perl
print "password=$ROOT_PASS\n";
print "repl_password=$REPL_PASS\n";
这个参数默认为空,所以MHA manager默认不调用任何脚本

参考

MySQL Master High Available 理论篇：
https://yq.aliyun.com/articles/58004?spm=5176.100239.blogcont57855.9.jUuCt0

结束语

云服务器上MySQL高可用，也可通过云负载均衡产品+MySQL复制来实现，但在数据安全性上，没有MHA+VIP+MySQL相对安全。