Aliyun ECS搭建MHA+Keepalived+VIP+MySQL5.7

前言

随着云服务的迅速发展，越来越多企业选择将服务托管在云服务中，在数据库领域，AWS RDS、Aliyun RDS等都是不错的选择，默认已经做了高可用，基础运维，可以为企业节省不少的运维成本。由于RDS物理数据、Root权限等其他对象对用户不开放，难免在自动化运维中有些壁垒。本文将围绕在Aliyun ECS结合MHA做MySQL的高可用，ECS不支持VIP，但阿里的产品高可用虚拟IP（HaVip）结合keepliaved等第三方软件可间接实现ip的漂移，可满足我们的需求。

MHA简介

MHA是由日本MySQL专家youshimaton(现就职于Facebook公司)用Perl写的一套MySQL故障切换方案，以保障数据库的高可用性。在MySQL故障切换过程中，MHA能做到在0~30s之内实现主MySQL故障转移。
该软件由两部分组成：MHA Manager（管理节点）和MHA Node（数据节点）。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群，也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上，MHA Manager会定时探测集群中的master节点，当master出现故障时，它可以自动将最新数据的slave提升为新的master，然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

高可用虚拟IP限制

每个VPC中最多只能同时存在5个vip对象
目前VPC中的网络通信不支持多播和广播，只支持单播；所以，如果用户是使用keepalived之类的第三方软件实现高可用，需要通过配置文件把通信方式改成单播；网上可以找到相应的方法
如果是使用keepalived之类的第三方软件，需要把信条消息的源IP改成ECS的私网IP（而不要用HaVip的私网IP进行心跳检查），不然很容易造成脑裂
当HaVip与EIP绑定后，进行公网通信时，持有HaVip的ECS实例应该通过HaVip的私网ip进行公网通信（而不是ECS自己的私网IP）；因为这时EIP是映射在HaVip的私网IP上，而不是映射在ECS的私网IP上
类似的，当使用HaVip做自建SNAT网关的高可用时，SNAT实例上配置的SNAT规则中，source IP应该是havip的私网IP而不是自己的private IP

准备

操作系统：Linux Centos 6.5
master：192.168.16.80
slave: 192.168.16.81
vip: 192.168.16.82 （注：需要和HaVip的私网IP一致）
其中，使用两台机器，分别部署主库和从库，MHA默认部署在从库上，下方中的VIP通EIP，最终是要把EIP和HaVip绑定在一起

系统初始化修改

修改主机名

主库修改主机名

[root@master ~]# cat /etc/sysconfig/network|grep HOSTNAME
HOSTNAME=master
[root@master ~]# cat /etc/hosts|grep master
192.168.16.80 master
[root@master ~]#

从库修改主机名

[root@slave ~]# cat /etc/sysconfig/network|grep HOSTNAME
HOSTNAME=slave
[root@slave ~]# cat /etc/hosts|grep slave
192.168.16.81 slave
[root@slave ~]#

设置防火墙

主库设置，允许从库访问

1
2
3

[root@master ~]# /etc/init.d/iptables status|grep 192.168.16.81
11   ACCEPT     all  --  192.168.16.81        0.0.0.0/0           
[root@master ~]#

其中，192.168.16.81是允许从库的访问规则

从库设置，允许主库访问

1
2
3

[root@slave ~]# /etc/init.d/iptables status|grep 192.168.16.80
11   ACCEPT     all  --  192.168.16.80        0.0.0.0/0           
[root@slave ~]#

其中，192.168.16.80是允许主库的访问规则

关闭SELINUX

主库查看

1
2
3

[root@master ~]# cat /etc/sysconfig/selinux |grep -v '#' |grep 'SELINUX=' 
SELINUX=disabled
[root@master ~]#

从库查看

1
2
3

[root@slave ~]# cat /etc/sysconfig/selinux |grep -v '#' |grep 'SELINUX=' 
SELINUX=disabled
[root@slave ~]#

建立SSH无密码登录

主库设置

修改服务器/etc/ssh/ssh_config文件，把参数GSSAPIAuthentication修改为no
修改服务器/etc/ssh/sshd_config文件，把参数PasswordAuthentication修改为yes,参数PermitRootLogin修改为yes,同时AllowUsers把root也添加上,重启ssh服务
修改服务器/etc/hosts.allow文件，允许从库连接

[root@master ~]# sed '/^#/d;/^$/d' /etc/ssh/ssh_config |grep GSSAPIAuthentication
        GSSAPIAuthentication no
[root@master ~]# sed '/^#/d;/^$/d' /etc/ssh/sshd_config |grep -E 'PasswordAuthentication|PermitRootLogin'
PermitRootLogin yes
PasswordAuthentication yes
[root@master ~]# /etc/init.d/sshd restart
Stopping sshd:                                             [  OK  ]
Starting sshd:                                             [  OK  ]
[root@master ~]# cat /etc/hosts.allow |grep 192.168.16.81
sshd: 192.168.16.81
[root@master ~]#

使用命令ssh-keygen生成公钥，发送到从库，同时尝试在主库无密码形式登录从库，而且也要保证本机无密码登录本机

[root@master ~]# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
07:86:76:d6:4a:0f:ea:6e:88:00:eb:3d:5f:7e:08:7a root@master
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|       . .       |
|      o B .      |
|.    . * =       |
|..    . S o      |
|o    o   .       |
|......o..        |
| ..+.E+. .       |
|    +o...        |
+-----------------+
[root@master ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.16.81
The authenticity of host '192.168.16.81 (192.168.16.81)' can't be established.
RSA key fingerprint is 9f:1f:41:f6:2d:e9:20:83:30:be:cd:20:01:31:ea:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.16.81' (RSA) to the list of known hosts.
root@192.168.16.81's password: 
Now try logging into the machine, with "ssh 'root@192.168.16.81'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@master ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.16.80
The authenticity of host '192.168.16.80 (192.168.16.80)' can't be established.
RSA key fingerprint is f2:f8:7d:7d:59:3f:b9:3c:a0:e6:66:54:1f:79:e2:40.
Are you sure you want to continue connecting (yes/no)? yes
root@192.168.16.80's password: 
Now try logging into the machine, with "ssh 'root@192.168.16.80'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@master ~]#

测试登录

[root@master ~]# ssh root@192.168.16.81
Last login: Wed Oct 11 14:45:10 2017 from 192.168.16.80

Welcome to aliyun Elastic Compute Service!

[root@slave ~]# exit
logout
Connection to 192.168.16.81 closed.
[root@master ~]# ssh root@192.168.16.80
Last login: Wed Jan 18 16:25:59 2017

Welcome to aliyun Elastic Compute Service!

[root@master ~]# exit
logout
Connection to 192.168.16.80 closed.
[root@master ~]#

从库设置

和主库保持一致
修改服务器/etc/ssh/ssh_config文件，把参数GSSAPIAuthentication修改为no
修改服务器/etc/ssh/sshd_config文件，把参数PasswordAuthentication修改为yes,参数PermitRootLogin修改为yes,同时AllowUsers把root也添加上,重启ssh服务
修改服务器/etc/hosts.allow文件，允许从库连接

[root@slave ~]# sed '/^#/d;/^$/d' /etc/ssh/ssh_config |grep GSSAPIAuthentication
        GSSAPIAuthentication no
[root@slave ~]# sed '/^#/d;/^$/d' /etc/ssh/sshd_config |grep -E 'PasswordAuthentication|PermitRootLogin'
PermitRootLogin yes
PasswordAuthentication yes
[root@slave ~]# /etc/init.d/sshd restart
Stopping sshd:                                             [  OK  ]
Starting sshd:                                             [  OK  ]
[root@slave ~]# cat /etc/hosts.allow |grep 192.168.16.80
sshd: 192.168.16.80
[root@slave ~]#

使用命令ssh-keygen生成公钥，发送到从库，同时尝试在主库无密码形式登录从库，而且也要保证本机无密码登录本机

[root@slave ~]# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
ae:04:32:4d:3e:d3:b7:5d:4e:d2:50:7a:4c:44:d7:0a root@slave
The key's randomart image is:
+--[ RSA 2048]----+
|           o= .. |
|           =E.  .|
|    .     o o. . |
|   + .     +  .  |
|  o * . S . +    |
|   o + o o =     |
|      . o . .    |
|     . .         |
|      .          |
+-----------------+
[root@slave ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.16.80
The authenticity of host '192.168.16.80 (192.168.16.80)' can't be established.
RSA key fingerprint is f2:f8:7d:7d:59:3f:b9:3c:a0:e6:66:54:1f:79:e2:40.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.16.80' (RSA) to the list of known hosts.
root@192.168.16.80's password: 
Now try logging into the machine, with "ssh 'root@192.168.16.80'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@slave ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.16.81
The authenticity of host '192.168.16.81 (192.168.16.81)' can't be established.
RSA key fingerprint is 9f:1f:41:f6:2d:e9:20:83:30:be:cd:20:01:31:ea:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.16.81' (RSA) to the list of known hosts.
root@192.168.16.81's password: 
Now try logging into the machine, with "ssh 'root@192.168.16.81'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@slave ~]#

测试登录

[root@slave ~]# ssh root@192.168.16.80
Last login: Wed Oct 11 14:48:14 2017 from 192.168.16.80

Welcome to aliyun Elastic Compute Service!

[root@master ~]# exit
logout
Connection to 192.168.16.80 closed.
[root@slave ~]# ssh root@192.168.16.81
Last login: Wed Oct 11 14:48:10 2017 from 192.168.16.80

Welcome to aliyun Elastic Compute Service!

[root@slave ~]# exit
logout
Connection to 192.168.16.81 closed.
[root@slave ~]#

MySQL安装

使用的是自己打包的RPM包，分别登录主库和从库，直接rpm -ivh percona-server-5.7.18-15.x86_64.rpm 即可，也可使用其他方式安装MySQL

[root@master ~]# mv /home/duanwenjie/percona-server-5.7.18-15.x86_64.rpm /usr/src/
[root@master ~]# rpm -ivh /usr/src/percona-server-5.7.18-15.x86_64.rpm 
Preparing...                ########################################### [100%]
Group 'mail' not found. Creating the user mailbox file with 0600 mode.
   1:percona-server         ########################################### [100%]
error reading information on service /etc/rc.d/init.d/mysqld: No such file or directory
MySQL (Percona Server) PID file could not be found![FAILED]
Starting MySQL (Percona Server)...[  OK  ]
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
Warning: Since password will be sent to server in plain text, use ssl connection to ensure password safety.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@master ~]#

查看是否安装成功,分别登录主库和从库

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 15:03:27 > select version();
+---------------+
| version()     |
+---------------+
| 5.7.18-15-log |
+---------------+
1 row in set (0.00 sec)

MySQL [(none)] 15:03:31 > exit
Bye
[root@master ~]#

MySQL互为主备

重要参数配置

主库必须包含以下参数

[root@master ~]# cat /etc/my.cnf 
[mysqld]
server-id = 1
autocommit = 1
auto_increment_increment = 1
auto_increment_offset = 2
log_bin = mysql-bin
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates

从库必须包含以下参数，注意slave需要动态设置read_only=1

[root@slave ~]# cat /etc/my.cnf 
[mysqld]
server-id = 2
autocommit = 1
auto_increment_increment = 2
auto_increment_offset = 2
log_bin = mysql-bin
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates
relay_log_purge=0
[root@slave ~]# mysql -uroot -p -e "show variables like 'read_only'"
Enter password: 
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
[root@slave ~]# mysql -uroot -p -e "set global read_only=1"
Enter password: 
[root@slave ~]# mysql -uroot -p -e "show variables like 'read_only'"
Enter password: 
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | ON    |
+---------------+-------+
[root@slave ~]#

复制帐号建立

由于MySQL版本使用的是5.7，创建用户的方法以之前有些不同

主库创建

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 15:09:37 > create user 'slave01'@'192.168.16.81' identified by 'slave123456';
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 15:10:13 > grant replication slave,replication client on *.* to 'slave01'@'192.168.16.81';
Query OK, 0 rows affected (0.03 sec)

MySQL [(none)] 15:10:39 > flush privileges;
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 15:10:42 > exit
Bye
[root@master ~]#

从库创建

[root@slave ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 15:11:18 > create user 'slave01'@'192.168.16.80' identified by 'slave123456'; 
Query OK, 0 rows affected (0.01 sec)

MySQL [(none)] 15:11:30 > grant replication slave,replication client on *.* to 'slave01'@'192.168.16.80';
Query OK, 0 rows affected (0.01 sec)

MySQL [(none)] 15:11:40 > flush privileges;
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 15:11:43 > exit
Bye
[root@slave ~]#

复制帐号验证

主库登录从库

[root@master ~]# mysql -uslave01 -pslave123456 -h192.168.16.81 -e "select version()" 
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+
| version()     |
+---------------+
| 5.7.18-15-log |
+---------------+
[root@master ~]#

从库登录主库

[root@slave ~]# mysql -uslave01 -pslave123456 -h192.168.16.80 -e "select version()" 
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+
| version()     |
+---------------+
| 5.7.18-15-log |
+---------------+
[root@slave ~]#

复制关系配置

主库配置

登录主库，设置复制关系，来源于从库的复制

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 15:14:07 > change master to master_host='192.168.16.81',master_user='slave01',master_password='slave123456',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.03 sec)

MySQL [(none)] 15:14:46 > start slave;
Query OK, 0 rows affected (0.04 sec)

MySQL [(none)] 15:14:49 > show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.16.81
                  Master_User: slave01
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 812
               Relay_Log_File: master-relay-bin.000002
                Relay_Log_Pos: 1025
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 812
              Relay_Log_Space: 1233
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: e1ec82e8-ae51-11e7-8532-00163e128057
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: e1ec82e8-ae51-11e7-8532-00163e128057:1-3
            Executed_Gtid_Set: df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-3,
e1ec82e8-ae51-11e7-8532-00163e128057:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

MySQL [(none)] 15:14:56 >

从库配置

登录从库，设置复制关系，来源于主库的复制

[root@slave ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 15:15:42 > change master to master_host='192.168.16.80',master_user='slave01',master_password='slave123456',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.05 sec)

MySQL [(none)] 15:15:53 > start slave;
Query OK, 0 rows affected (0.04 sec)

MySQL [(none)] 15:15:55 > show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.16.80
                  Master_User: slave01
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1470
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 1072
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1470
              Relay_Log_Space: 1279
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: df5051fc-ae51-11e7-85ee-00163e0f0e4a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-3
            Executed_Gtid_Set: df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-3,
e1ec82e8-ae51-11e7-8532-00163e128057:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

MySQL [(none)] 15:16:00 >

复制测试

主库登录测试

登录主库，插入测试数据

MySQL [test11] 15:17:10 > create table if not exists t11(id int unsigned not null auto_increment,name varchar(20),primary key(`id`));
Query OK, 0 rows affected (0.04 sec)

MySQL [test11] 15:17:20 > insert into t11 values(null,'master11');
Query OK, 1 row affected (0.00 sec)

MySQL [test11] 15:17:28 > select * from t11;
+----+----------+
| id | name     |
+----+----------+
|  1 | master11 |
+----+----------+
1 row in set (0.00 sec)

登录从库，查看测试数据，此时数据已经复制过来

[root@slave ~]# mysql -uroot -p test11 -e "select * from t11"
Enter password: 
+----+----------+
| id | name     |
+----+----------+
|  1 | master11 |
+----+----------+
[root@slave ~]#

从库登录测试

登录从库，插入测试数据

MySQL [(none)] 15:18:56 > create database if not exists test11;use test11;
Query OK, 1 row affected, 1 warning (0.00 sec)

Database changed
MySQL [test11] 15:19:20 > create table if not exists t22(id int unsigned not null auto_increment,name varchar(20),primary key(`id`));
Query OK, 0 rows affected (0.05 sec)

MySQL [test11] 15:19:38 > insert into t22 values(null,'slave11'); 
Query OK, 1 row affected (0.00 sec)

MySQL [test11] 15:19:43 > select * from t22;
+----+---------+
| id | name    |
+----+---------+
|  2 | slave11 |
+----+---------+
1 row in set (0.00 sec)

MySQL [test11] 15:19:51 >

登录主库，查看测试数据，此时数据已经复制过来

[root@master ~]# mysql -uroot -p test11 -e "select * from t22"
Enter password: 
+----+---------+
| id | name    |
+----+---------+
|  2 | slave11 |
+----+---------+
[root@master ~]#

MHA帐号创建

主库创建

登录主库，创建允许主库和从库连接的MHA用户信息，再分别尝试使用MHA用户登录(省略)

[root@master ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 29
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)] 15:23:16 > create user 'mha01'@'192.168.16.81' identified by 'mha123456';
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 15:24:02 > grant all privileges on *.* to 'mha01'@'192.168.16.81';
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)] 15:24:27 > create user 'mha01'@'192.168.16.80' identified by 'mha123456'; 
Query OK, 0 rows affected (0.01 sec)

MySQL [(none)] 15:24:35 > grant all privileges on *.* to 'mha01'@'192.168.16.80';
Query OK, 0 rows affected (0.02 sec)

MySQL [(none)] 15:24:41 > flush privileges;
Query OK, 0 rows affected (0.00 sec)

从库创建

登录从库，由于复制已经把在主库创建的MHA用户信息复制了过来，尝试登录即可(省略)

MHA安装

安装依赖

分别在主库和从库上安装，执行命令：
yum -y install gcc gcc-c++ make openssl-devel perl perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles perl-Time-HiRes perl-Module-Install.noarch mailx jwhois cpan

主库安装mha4mysql-node

MHA管理端安装在从库上，主库上只需要mha4mysql-node即可，由于MySQL使用5.7版本，MHA也使用最新的0.57版本，采用编译安装方式。下载地址：https://mega.nz/#F!G4oRjARB!SWzFS59bUv9VrKwdAeIGVw

[root@master ~]# cd /usr/src/
[root@master src]# tar -zxf mha4mysql-node-0.57.tar.gz
[root@master mha4mysql-node-0.57]# perl Makefile.PL             #编译过程省略
[root@master mha4mysql-node-0.57]# make
[root@master mha4mysql-node-0.57]# make install
[root@master mha4mysql-node-0.57]# ll /usr/local/bin/ -t        #查看安装，有以下文件则mha4mysql-node安装成功
total 1760
-r-xr-xr-x 1 root root  16381 Jul 21 09:41 apply_diff_relay_logs
-r-xr-xr-x 1 root root   8261 Jul 21 09:41 purge_relay_logs
-r-xr-xr-x 1 root root   7525 Jul 21 09:41 save_binary_logs
-r-xr-xr-x 1 root root   4807 Jul 21 09:41 filter_mysqlbinlog

从库安装mha4mysql-manager和mha4mysql-node

MHA管理端安装在从库，从库需要安装manager端和node端

[root@slave ~]# cd /usr/src/
[root@slave src]# tar -zxf mha4mysql-manager-0.57.tar.gz 
[root@slave src]# cd mha4mysql-manager-0.57
[root@slave mha4mysql-manager-0.57]# perl Makefile.PL           #编译过程省略
[root@slave mha4mysql-manager-0.57]# make
[root@slave mha4mysql-manager-0.57]# make install 
[root@slave mha4mysql-manager-0.57]# ll /usr/local/bin/ -t      #查看安装，有以下文件则mha4mysql-manager安装成功
total 40
-r-xr-xr-x 1 root root 1779 Oct 11 15:45 masterha_check_ssh
-r-xr-xr-x 1 root root 2517 Oct 11 15:45 masterha_manager
-r-xr-xr-x 1 root root 2373 Oct 11 15:45 masterha_master_switch
-r-xr-xr-x 1 root root 5171 Oct 11 15:45 masterha_secondary_check
-r-xr-xr-x 1 root root 1995 Oct 11 15:45 masterha_check_repl
-r-xr-xr-x 1 root root 1865 Oct 11 15:45 masterha_check_status
-r-xr-xr-x 1 root root 3201 Oct 11 15:45 masterha_conf_host
-r-xr-xr-x 1 root root 2165 Oct 11 15:45 masterha_master_monitor
-r-xr-xr-x 1 root root 1739 Oct 11 15:45 masterha_stop
[root@slave mha4mysql-manager-0.57]# cd ..
[root@slave src]# tar -zxf mha4mysql-node-0.57.tar.gz 
[root@slave src]# cd mha4mysql-node-0.57
[root@slave mha4mysql-node-0.57]# perl Makefile.PL              #编译过程省略        
[root@slave mha4mysql-node-0.57]# make
[root@slave mha4mysql-node-0.57]# make install
[root@slave mha4mysql-node-0.57]# ll /usr/local/bin/ -t         #查看安装，有以下文件则mha4mysql-node安装成功
total 84
-r-xr-xr-x 1 root root 16381 Oct 11 15:47 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Oct 11 15:47 filter_mysqlbinlog
-r-xr-xr-x 1 root root  8261 Oct 11 15:47 purge_relay_logs
-r-xr-xr-x 1 root root  7525 Oct 11 15:47 save_binary_logs

MHA目录结构说明

MHAManager

mhamanager工具包主要包括以下工具

[root@slave  ~]# ll /usr/local/bin/
总用量 40
-r-xr-xr-x  1 root root  1995 Oct 11 15:45 masterha_check_repl         #检查MySQL复制情况
-r-xr-xr-x  1 root root  1779 Oct 11 15:45 masterha_check_ssh          #检查MHA的SSH配置情况
-r-xr-xr-x  1 root root  1865 Oct 11 15:45 masterha_check_status       #检测当前MHA运行状态
-r-xr-xr-x  1 root root  3201 Oct 11 15:45 masterha_conf_host          #添加或删除配置的server信息
-r-xr-xr-x  1 root root  2517 Oct 11 15:45 masterha_manager            #启动MHA
-r-xr-xr-x  1 root root  2165 Oct 11 15:45 masterha_master_monitor     #检测Master是否宕机
-r-xr-xr-x  1 root root  2373 Oct 11 15:45 masterha_master_switch      #控制故障转移，自动或者手动
-r-xr-xr-x  1 root root  5172 Oct 11 15:45 masterha_secondary_check    #通过其他路由检测Master是否真的宕机
-r-xr-xr-x  1 root root  1739 Oct 11 15:45 masterha_stop               #停止MHA

MHANode

mhanode工具包主要包括以下工具

[root@slave ~]# ll /usr/local/bin/ -t   
total 84
-r-xr-xr-x  1 root root 16371 Oct 11 15:47 apply_diff_relay_logs       #识别差异日志的中继日志，并将其差异事件应用于其他Slave
-r-xr-xr-x  1 root root  4807 Oct 11 15:47 filter_mysqlbinlog          #去除不必要的Rollback事件
-r-xr-xr-x  1 root root  8263 Oct 11 15:47 purge_relay_logs            #删除无用的Relay log，避免延时
-r-xr-xr-x  1 root root  7525 Oct 11 15:47 save_binary_logs            #保存和复制down掉的主服务器二进制日志

自定义扩展脚本说明

secondary_check_script #通过多条网络路由检测master的可用性
master_ip_failover_script #自动failover时候的切换脚本，可将vip信息写入此脚本中
shutdown_script #强制关闭master节点执行脚本
report_script #发送报告
init_conf_load_script #加载初始配置参数，如不想在配置中写明文密码
master_ip_online_change_script #手动failover时候的切换脚本

keepalived安装

MHA作为MySQL的HA软件,借助脚本或者第三方软件如keepalived,可实现自动的failover
本次环境使用yum安装keepalived,主库和从库分别执行安装: yum -y install keepalived
keepalived的配置文件,默认在主库上绑定EIP,priority要比备库高,同时为了避免脑裂,两个state同时设置为BACKUP,由于HaVip不支持组播和广播通讯,因此需要将keepalived的心跳方式设置为单播,添加unicast_src_ip和unicast_peer

配置文件

主库配置

[root@master ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
     duanwenjie@huan.tv
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.16.82 dev eth0 label eth0:havip
    }
        unicast_src_ip 192.168.16.80
        unicast_peer {
                192.168.16.81
                        }
}

从库配置

[root@slave ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
     duanwenjie@huan.tv
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 99
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.16.82 dev eth0 label eth0:havip
    }
        unicast_src_ip 192.168.16.81
        unicast_peer {
                192.168.16.80
                        }
}

EIP漂移测试

分别启动keepalived服务，查看默认EIP在哪台机器上，同时关闭EIP所在机器的keepalived服务，查看EIP是否漂移，最后恢复原状

主库启动keepalived服务

1
2
3

[root@master keepalived]# /etc/init.d/keepalived start
Starting keepalived:                                       [  OK  ]
[root@master keepalived]#

从库启动keepalived服务

1
2
3

[root@slave keepalived]# /etc/init.d/keepalived start
Starting keepalived:                                       [  OK  ]
[root@slave keepalived]#

查看默认EIP在主库上

[root@master ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.80  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:172562 errors:0 dropped:0 overruns:0 frame:0
          TX packets:71674 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:200695958 (191.3 MiB)  TX bytes:12526649 (11.9 MiB)

eth0:havip Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.82  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:7946 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7946 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:510910 (498.9 KiB)  TX bytes:510910 (498.9 KiB)

[root@master ~]#

关闭主库的keepalived服务

关闭主库的keepalived服务后，EIP消失

[root@master ~]# /etc/init.d/keepalived stop
Stopping keepalived:                                       [  OK  ]
[root@master ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.80  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:172670 errors:0 dropped:0 overruns:0 frame:0
          TX packets:71803 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:200704346 (191.4 MiB)  TX bytes:12569281 (11.9 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:7976 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7976 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:512770 (500.7 KiB)  TX bytes:512770 (500.7 KiB)

[root@master ~]#

查看EIP是否漂移到从库上

此时EIP已经漂移到从库上

[root@slave ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:12:80:57  
          inet addr:192.168.16.81  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:164539 errors:0 dropped:0 overruns:0 frame:0
          TX packets:77532 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:202145589 (192.7 MiB)  TX bytes:12091265 (11.5 MiB)

eth0:havip Link encap:Ethernet  HWaddr 00:16:3E:12:80:57  
          inet addr:192.168.16.82  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:7961 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7961 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:508050 (496.1 KiB)  TX bytes:508050 (496.1 KiB)

[root@slave ~]#

恢复原状

主库启动keepalived服务，查看EIP已经又漂移回来

[root@master ~]# /etc/init.d/keepalived start
Starting keepalived:                                       [  OK  ]
[root@master ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.80  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:172863 errors:0 dropped:0 overruns:0 frame:0
          TX packets:71978 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:200717148 (191.4 MiB)  TX bytes:12627987 (12.0 MiB)

eth0:havip Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.82  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:8039 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8039 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:516676 (504.5 KiB)  TX bytes:516676 (504.5 KiB)

[root@master ~]#

MHA配置

MHAManager

创建配置目录/etc/masterha/，同时创建一个项目上的配置文件，仅配置自动FailOver部分，脚本暂时不定义，下文会有MHA引入keepalived。

[server default]
manager_workdir=/var/log/masterha/app1   
manager_log=/var/log/masterha/app1/manager.log 
master_binlog_dir=/hwdata/data/percona        
password=mha123456                             
user=mha01                                     
ping_interval=1                                    
remote_workdir=/tmp                                                
repl_password=slave123456                                          
repl_user=slave01                                                 
ssh_user=root                                                     
#master_ip_failover_script=/usr/local/bin/master_ip_failover        
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change  
#report_script=/usr/local/bin/send_report                         
#shutdown_script=                                                 
secondary_check_script = masterha_secondary_check -s 192.168.16.80 -s 192.168.16.81 --user=root hostname=192.168.16.80  --master_ip=192.168.16.80 --master_port=3306 
[server1]
hostname=192.168.16.80                                          
port=3306
[server2]
hostname=192.168.16.81                                         
port=3306
candidate_master=1

测试MHAManager SSH

[root@slave masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Wed Oct 11 17:21:46 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Oct 11 17:21:46 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Oct 11 17:21:46 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Oct 11 17:21:46 2017 - [info] Starting SSH connection tests..
Wed Oct 11 17:21:47 2017 - [debug] 
Wed Oct 11 17:21:46 2017 - [debug]  Connecting via SSH from root@192.168.16.80(192.168.16.80:22) to root@192.168.16.81(192.168.16.81:22)..
Wed Oct 11 17:21:47 2017 - [debug]   ok.
Wed Oct 11 17:21:47 2017 - [debug] 
Wed Oct 11 17:21:47 2017 - [debug]  Connecting via SSH from root@192.168.16.81(192.168.16.81:22) to root@192.168.16.80(192.168.16.80:22)..
Wed Oct 11 17:21:47 2017 - [debug]   ok.
Wed Oct 11 17:21:47 2017 - [info] All SSH connection tests passed successfully.
[root@slave masterha]#

测试MHAManager 复制

[root@slave masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed Oct 11 17:22:47 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Oct 11 17:22:47 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Oct 11 17:22:47 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Oct 11 17:22:47 2017 - [info] MHA::MasterMonitor version 0.57.
Creating directory /var/log/masterha/app1.. done.
Wed Oct 11 17:22:48 2017 - [info] Multi-master configuration is detected. Current primary(writable) master is 192.168.16.80(192.168.16.80:3306)
Wed Oct 11 17:22:48 2017 - [info] Master configurations are as below: 
Master 192.168.16.81(192.168.16.81:3306), replicating from 192.168.16.80(192.168.16.80:3306), read-only
Master 192.168.16.80(192.168.16.80:3306), replicating from 192.168.16.81(192.168.16.81:3306)

Wed Oct 11 17:22:48 2017 - [info] GTID failover mode = 1
Wed Oct 11 17:22:48 2017 - [info] Dead Servers:
Wed Oct 11 17:22:48 2017 - [info] Alive Servers:
Wed Oct 11 17:22:48 2017 - [info]   192.168.16.80(192.168.16.80:3306)
Wed Oct 11 17:22:48 2017 - [info]   192.168.16.81(192.168.16.81:3306)
Wed Oct 11 17:22:48 2017 - [info] Alive Slaves:
Wed Oct 11 17:22:48 2017 - [info]   192.168.16.81(192.168.16.81:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Wed Oct 11 17:22:48 2017 - [info]     GTID ON
Wed Oct 11 17:22:48 2017 - [info]     Replicating from 192.168.16.80(192.168.16.80:3306)
Wed Oct 11 17:22:48 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Oct 11 17:22:48 2017 - [info] Current Alive Master: 192.168.16.80(192.168.16.80:3306)
Wed Oct 11 17:22:48 2017 - [info] Checking slave configurations..
Wed Oct 11 17:22:48 2017 - [info] Checking replication filtering settings..
Wed Oct 11 17:22:48 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Oct 11 17:22:48 2017 - [info]  Replication filtering check ok.
Wed Oct 11 17:22:48 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Wed Oct 11 17:22:48 2017 - [info] Checking SSH publickey authentication settings on the current master..
Wed Oct 11 17:22:48 2017 - [info] HealthCheck: SSH to 192.168.16.80 is reachable.
Wed Oct 11 17:22:48 2017 - [info] 
192.168.16.80(192.168.16.80:3306) (current master)
 +--192.168.16.81(192.168.16.81:3306)

Wed Oct 11 17:22:48 2017 - [info] Checking replication health on 192.168.16.81..
Wed Oct 11 17:22:48 2017 - [info]  ok.
Wed Oct 11 17:22:48 2017 - [warning] master_ip_failover_script is not defined.
Wed Oct 11 17:22:48 2017 - [warning] shutdown_script is not defined.
Wed Oct 11 17:22:48 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
[root@slave masterha]#

MHA引入keepalived

把keepalived服务引入MHA，需要修改切换是触发的脚本文件master_ip_failover即可，在该脚本中添加在master发生宕机时对keepalived的处理。

master_ip_failover脚本

编辑脚本/usr/local/bin/master_ip_failover，修改后如下，这里完整的脚本

[root@slave app1]# cat /usr/local/bin/master_ip_failover 
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.16.82';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        #`ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

[root@slave app1]#

添加权限
chmod +x /usr/local/bin/master_ip_failover

send_report脚本

[root@slave masterha]# cat /usr/local/bin/send_report 
#!/usr/bin/perl
 
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
 
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.126.com';
my $mail_from='xxx@126.com';
my $mail_user='xxx@126.com';
my $mail_pass='xxx';
#my $mail_to=['xxx','xxx'];
my $mail_to='xxx';
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);
 
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
 
sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /var/log/masterha/app1/manager.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };
 
    $sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}
 
# Do whatever you want here
 
exit 0;
[root@slave masterha]#

添加权限
chmod +x /usr/local/bin/send_report

MHAManager修改注释

[root@slave app1]# cat /etc/masterha/app1.cnf |grep master_ip_failover_script
master_ip_failover_script=/usr/local/bin/master_ip_failover        
[root@slave app1]# cat /etc/masterha/app1.cnf |grep report
report_script=/usr/local/bin/send_report                         
[root@slave app1]#

HaVip绑定EIP

keepalived服务已经配置完成，VIP也可以自动漂移后，但是其还不能在内网中通讯，需要将EIP映射到HaVip中，创建高可用虚拟IP，把master和slave两个实例的绑定在一起，具体步骤省略，见下图

MHA启动

1
2
3

[root@slave masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
[1] 6706
[root@slave masterha]# nohup: ignoring input and appending output to `nohup.out'

查看启动日志

[root@slave app1]# cat /var/log/masterha/app1/manager.log
Thu Oct 12 09:26:56 2017 - [info] MHA::MasterMonitor version 0.57.
Thu Oct 12 09:26:57 2017 - [info] Multi-master configuration is detected. Current primary(writable) master is 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:26:57 2017 - [info] Master configurations are as below: 
Master 192.168.16.81(192.168.16.81:3306), replicating from 192.168.16.80(192.168.16.80:3306), read-only
Master 192.168.16.80(192.168.16.80:3306), replicating from 192.168.16.81(192.168.16.81:3306)

Thu Oct 12 09:26:57 2017 - [info] GTID failover mode = 1
Thu Oct 12 09:26:57 2017 - [info] Dead Servers:
Thu Oct 12 09:26:57 2017 - [info] Alive Servers:
Thu Oct 12 09:26:57 2017 - [info]   192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:26:57 2017 - [info]   192.168.16.81(192.168.16.81:3306)
Thu Oct 12 09:26:57 2017 - [info] Alive Slaves:
Thu Oct 12 09:26:57 2017 - [info]   192.168.16.81(192.168.16.81:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:26:57 2017 - [info]     GTID ON
Thu Oct 12 09:26:57 2017 - [info]     Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:26:57 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:26:57 2017 - [info] Current Alive Master: 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:26:57 2017 - [info] Checking slave configurations..
Thu Oct 12 09:26:57 2017 - [info] Checking replication filtering settings..
Thu Oct 12 09:26:57 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Oct 12 09:26:57 2017 - [info]  Replication filtering check ok.
Thu Oct 12 09:26:57 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Oct 12 09:26:57 2017 - [info] Checking SSH publickey authentication settings on the current master..
Thu Oct 12 09:26:57 2017 - [info] HealthCheck: SSH to 192.168.16.80 is reachable.
Thu Oct 12 09:26:57 2017 - [info] 
192.168.16.80(192.168.16.80:3306) (current master)
 +--192.168.16.81(192.168.16.81:3306)

Thu Oct 12 09:26:57 2017 - [info] Checking master_ip_failover_script status:
Thu Oct 12 09:26:57 2017 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.16.80 --orig_master_ip=192.168.16.80 --orig_master_port=3306 


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK 
Thu Oct 12 09:26:57 2017 - [info]  OK.
Thu Oct 12 09:26:57 2017 - [warning] shutdown_script is not defined.
Thu Oct 12 09:26:57 2017 - [info] Set master ping interval 1 seconds.
Thu Oct 12 09:26:57 2017 - [info] Set secondary check script: masterha_secondary_check -s 192.168.16.80 -s 192.168.16.81 --user=root hostname=192.168.16.80  --master_ip=192.168.16.80 --master_port=3306
Thu Oct 12 09:26:57 2017 - [info] Starting ping health check on 192.168.16.80(192.168.16.80:3306)..
Thu Oct 12 09:26:57 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
[root@slave app1]#

模拟主库故障(自动FailOver)

确认MHAManager进程状态

[root@slave app1]# ps aux|grep master
root      1217  0.0  0.0  81520  3440 ?        Ss   Oct11   0:00 /usr/libexec/postfix/master
root     20887  0.2  0.4 194688 18736 pts/1    S    08:57   0:00 perl /usr/local/bin/masterha_manager --conf=/etc/masterha/app1.cnf
root     20998  0.0  0.0 103252   844 pts/1    S+   08:59   0:00 grep master
[root@slave app1]# ifconfig     #VIP不在slave上
eth0      Link encap:Ethernet  HWaddr 00:16:3E:12:80:57  
          inet addr:192.168.16.81  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:288188 errors:0 dropped:0 overruns:0 frame:0
          TX packets:129868 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:211584573 (201.7 MiB)  TX bytes:50227849 (47.9 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:50346 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50346 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3204922 (3.0 MiB)  TX bytes:3204922 (3.0 MiB)

[root@slave app1]#

停止主库MySQL服务

[root@master ~]# ifconfig   #停止前VIP存在于主库上
eth0      Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.80  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:278061 errors:0 dropped:0 overruns:0 frame:0
          TX packets:223278 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:210643899 (200.8 MiB)  TX bytes:57552395 (54.8 MiB)

eth0:havip Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.82  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:50158 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50158 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3130070 (2.9 MiB)  TX bytes:3130070 (2.9 MiB)

[root@master ~]# /etc/init.d/mysqld stop
Shutting down MySQL (Percona Server)............           [  OK  ]
[root@master ~]# ifconfig   #停止后VIP已经漂移走
eth0      Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.80  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:278240 errors:0 dropped:0 overruns:0 frame:0
          TX packets:223445 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:210664827 (200.9 MiB)  TX bytes:57596785 (54.9 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:50175 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50175 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3131100 (2.9 MiB)  TX bytes:3131100 (2.9 MiB)

[root@master ~]#

查看FailOver日志

登录从库，查看MHAManager日志

[root@slave app1]# tail -f /var/log/masterha/app1/manager.log

IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK 
Thu Oct 12 09:26:57 2017 - [info]  OK.
Thu Oct 12 09:26:57 2017 - [warning] shutdown_script is not defined.
Thu Oct 12 09:26:57 2017 - [info] Set master ping interval 1 seconds.
Thu Oct 12 09:26:57 2017 - [info] Set secondary check script: masterha_secondary_check -s 192.168.16.80 -s 192.168.16.81 --user=root hostname=192.168.16.80  --master_ip=192.168.16.80 --master_port=3306
Thu Oct 12 09:26:57 2017 - [info] Starting ping health check on 192.168.16.80(192.168.16.80:3306)..
Thu Oct 12 09:26:57 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..     #开始监听服务状态
#监控到master故障后，MHA开始做自动FailOver
Thu Oct 12 09:27:54 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Thu Oct 12 09:27:54 2017 - [info] Executing secondary network check script: masterha_secondary_check -s 192.168.16.80 -s 192.168.16.81 --user=root hostname=192.168.16.80  --master_ip=192.168.16.80 --master_port=3306  --user=root  --master_host=192.168.16.80  --master_ip=192.168.16.80  --master_port=3306 --master_user=mha01 --master_password=mha123456 --ping_type=SELECT
Thu Oct 12 09:27:54 2017 - [info] Executing SSH check script: exit 0
Thu Oct 12 09:27:54 2017 - [info] HealthCheck: SSH to 192.168.16.80 is reachable.
Monitoring server 192.168.16.80 is reachable, Master is not reachable from 192.168.16.80. OK.
Monitoring server 192.168.16.81 is reachable, Master is not reachable from 192.168.16.81. OK.
Thu Oct 12 09:27:54 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Thu Oct 12 09:27:55 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Oct 12 09:27:55 2017 - [warning] Connection failed 2 time(s)..
Thu Oct 12 09:27:56 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Oct 12 09:27:56 2017 - [warning] Connection failed 3 time(s)..
Thu Oct 12 09:27:57 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Oct 12 09:27:57 2017 - [warning] Connection failed 4 time(s)..
Thu Oct 12 09:27:57 2017 - [warning] Master is not reachable from health checker!
Thu Oct 12 09:27:57 2017 - [warning] Master 192.168.16.80(192.168.16.80:3306) is not reachable!
Thu Oct 12 09:27:57 2017 - [warning] SSH is reachable.
Thu Oct 12 09:27:57 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Thu Oct 12 09:27:57 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Oct 12 09:27:57 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Oct 12 09:27:57 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Oct 12 09:27:58 2017 - [info] GTID failover mode = 1
Thu Oct 12 09:27:58 2017 - [info] Dead Servers:
Thu Oct 12 09:27:58 2017 - [info]   192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:58 2017 - [info] Alive Servers:
Thu Oct 12 09:27:58 2017 - [info]   192.168.16.81(192.168.16.81:3306)
Thu Oct 12 09:27:58 2017 - [info] Alive Slaves:
Thu Oct 12 09:27:58 2017 - [info]   192.168.16.81(192.168.16.81:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:58 2017 - [info]     GTID ON
Thu Oct 12 09:27:58 2017 - [info]     Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:58 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:58 2017 - [info] Checking slave configurations..
Thu Oct 12 09:27:58 2017 - [info] Checking replication filtering settings..
Thu Oct 12 09:27:58 2017 - [info]  Replication filtering check ok.
Thu Oct 12 09:27:58 2017 - [info] Master is down!
Thu Oct 12 09:27:58 2017 - [info] Terminating monitoring script.
Thu Oct 12 09:27:58 2017 - [info] Got exit code 20 (Master dead).
Thu Oct 12 09:27:58 2017 - [info] MHA::MasterFailover version 0.57.
Thu Oct 12 09:27:58 2017 - [info] Starting master failover.
Thu Oct 12 09:27:58 2017 - [info] 
Thu Oct 12 09:27:58 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Oct 12 09:27:58 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] GTID failover mode = 1
Thu Oct 12 09:27:59 2017 - [info] Dead Servers:
Thu Oct 12 09:27:59 2017 - [info]   192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info] Checking master reachability via MySQL(double check)...
Thu Oct 12 09:27:59 2017 - [info]  ok.
Thu Oct 12 09:27:59 2017 - [info] Alive Servers:
Thu Oct 12 09:27:59 2017 - [info]   192.168.16.81(192.168.16.81:3306)
Thu Oct 12 09:27:59 2017 - [info] Alive Slaves:
Thu Oct 12 09:27:59 2017 - [info]   192.168.16.81(192.168.16.81:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info]     GTID ON
Thu Oct 12 09:27:59 2017 - [info]     Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info] Starting GTID based failover.
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Oct 12 09:27:59 2017 - [info] Executing master IP deactivation script:
Thu Oct 12 09:27:59 2017 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=192.168.16.80 --orig_master_ip=192.168.16.80 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Disabling the VIP on old master: 192.168.16.80 
Thu Oct 12 09:27:59 2017 - [info]  done.
Thu Oct 12 09:27:59 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Oct 12 09:27:59 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 3: Master Recovery Phase..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000005:234
Thu Oct 12 09:27:59 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Oct 12 09:27:59 2017 - [info]   192.168.16.81(192.168.16.81:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info]     GTID ON
Thu Oct 12 09:27:59 2017 - [info]     Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000005:234
Thu Oct 12 09:27:59 2017 - [info] Oldest slaves:
Thu Oct 12 09:27:59 2017 - [info]   192.168.16.81(192.168.16.81:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info]     GTID ON
Thu Oct 12 09:27:59 2017 - [info]     Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 3.3: Determining New Master Phase..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] Searching new master from slaves..
Thu Oct 12 09:27:59 2017 - [info]  Candidate masters from the configuration file:
Thu Oct 12 09:27:59 2017 - [info]   192.168.16.81(192.168.16.81:3306)  Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info]     GTID ON
Thu Oct 12 09:27:59 2017 - [info]     Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info]  Non-candidate masters:
Thu Oct 12 09:27:59 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Oct 12 09:27:59 2017 - [info] New master is 192.168.16.81(192.168.16.81:3306)
Thu Oct 12 09:27:59 2017 - [info] Starting master failover..
Thu Oct 12 09:27:59 2017 - [info] 
From:
192.168.16.80(192.168.16.80:3306) (current master)
 +--192.168.16.81(192.168.16.81:3306)

To:
192.168.16.81(192.168.16.81:3306) (new master)
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 3.3: New Master Recovery Phase..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info]  Waiting all logs to be applied.. 
Thu Oct 12 09:27:59 2017 - [info]   done.
Thu Oct 12 09:27:59 2017 - [info] Getting new master's binlog name and position..
Thu Oct 12 09:27:59 2017 - [info]  mysql-bin.000002:4876
Thu Oct 12 09:27:59 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.16.81', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='slave01', MASTER_PASSWORD='xxx';
Thu Oct 12 09:27:59 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 4876, df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-15,
e1ec82e8-ae51-11e7-8532-00163e128057:1-6
Thu Oct 12 09:27:59 2017 - [info] Executing master IP activate script:
Thu Oct 12 09:27:59 2017 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.16.80 --orig_master_ip=192.168.16.80 --orig_master_port=3306 --new_master_host=192.168.16.81 --new_master_ip=192.168.16.81 --new_master_port=3306 --new_master_user='mha01'   --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Enabling the VIP - 192.168.16.200 on the new master - 192.168.16.81 
Thu Oct 12 09:27:59 2017 - [info]  OK.
Thu Oct 12 09:27:59 2017 - [info] Setting read_only=0 on 192.168.16.81(192.168.16.81:3306)..
Thu Oct 12 09:27:59 2017 - [info]  ok.
Thu Oct 12 09:27:59 2017 - [info] ** Finished master recovery successfully.
Thu Oct 12 09:27:59 2017 - [info] * Phase 3: Master Recovery Phase completed.
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 4: Slaves Recovery Phase..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] All new slave servers recovered successfully.
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] * Phase 5: New master cleanup phase..
Thu Oct 12 09:27:59 2017 - [info] 
Thu Oct 12 09:27:59 2017 - [info] Resetting slave info on the new master..
Thu Oct 12 09:27:59 2017 - [info]  192.168.16.81: Resetting slave info succeeded.
Thu Oct 12 09:27:59 2017 - [info] Master failover to 192.168.16.81(192.168.16.81:3306) completed successfully.
Thu Oct 12 09:27:59 2017 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.16.80(192.168.16.80:3306) to 192.168.16.81(192.168.16.81:3306) succeeded

Master 192.168.16.80(192.168.16.80:3306) is down!

Check MHA Manager logs at slave:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.16.80(192.168.16.80:3306)
Selected 192.168.16.81(192.168.16.81:3306) as a new master.
192.168.16.81(192.168.16.81:3306): OK: Applying all logs succeeded.
192.168.16.81(192.168.16.81:3306): OK: Activated master IP address.
192.168.16.81(192.168.16.81:3306): Resetting slave info succeeded.
Master failover to 192.168.16.81(192.168.16.81:3306) completed successfully.
Thu Oct 12 09:27:59 2017 - [info] Sending mail..
Option new_slave_hosts requires an argument
Unknown option: conf
tail: /var/log/masterha/app1/manager.log: file truncated
^C
[1]+  Done                    nohup masterha_manager --conf=/etc/masterha/app1.cnf  (wd: /etc/masterha)
(wd now: /var/log/masterha/app1)
[root@slave app1]#

注意：
由于阿里云的ECS默认不允许发送邮件，它们把25端口已经封掉，如果需要开通25端口，主联系阿里云技术支持。本次切换日志在最后没有发送邮件，是由于这个原因导致的。

日志解析过程

启动前的准备工作
检查数据库服务器状态,获取相关参数设置
检查GTID、candidate_master、过滤DB是否设置
测试ssh连接是否成功
测试MHA node是否可用
创建MHA日志目录
开始检查slave的差异日志应用权限
确定当前的复制架构
调试master_ip_failover_script
调试shutdown_script
设置二次检查的主机masterha_secondary_check
MHA启动完毕,进入监测状态
监测master服务器挂了
通过定义的二次监测,确认master是否挂了
确认master挂了,开始进入failover流程
再试尝试连接master和master的ssh
通过MHA配置文件,监测其他slave的状态
再次监测slave的配置是否有变化,是否符合failover条件
正式开始failover
再次对slave配置做检查
对原Master做master_ip_failover_script和shutdown_script的操作
开始差异日志的恢复，获取slave最后得到的binlog位置
获取原master的binlog日志
确定新的master
在new master上应用差异的binlog日志
获取new master的binlog位置。
执行master_ip_failover_script,调用aws_vip_change.sh，执行VIP漂移
开始恢复其他slave的差异日志
差异日志应用完成以后,切换所有slave到new master。
failover操作完成,生成failover报告
最后发送邮件通知

确认FailOver状态

登录从库，查看VIP是否漂移过程

[root@slave app1]# ifconfig     #VIP已经漂移过程
eth0      Link encap:Ethernet  HWaddr 00:16:3E:12:80:57  
          inet addr:192.168.16.81  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:295022 errors:0 dropped:0 overruns:0 frame:0
          TX packets:135929 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:212253710 (202.4 MiB)  TX bytes:52246361 (49.8 MiB)

eth0:havip Link encap:Ethernet  HWaddr 00:16:3E:12:80:57  
          inet addr:192.168.16.82  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:52973 errors:0 dropped:0 overruns:0 frame:0
          TX packets:52973 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3597938 (3.4 MiB)  TX bytes:3597938 (3.4 MiB)

[root@slave app1]#

登录主库，查看VIP是否存在，keepalived服务状态

[root@master ~]# ifconfig   #VIP已经不存在
eth0      Link encap:Ethernet  HWaddr 00:16:3E:0F:0E:4A  
          inet addr:192.168.16.80  Bcast:192.168.31.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:284277 errors:0 dropped:0 overruns:0 frame:0
          TX packets:229786 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:211190989 (201.4 MiB)  TX bytes:60010868 (57.2 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:51564 errors:0 dropped:0 overruns:0 frame:0
          TX packets:51564 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3217146 (3.0 MiB)  TX bytes:3217146 (3.0 MiB)

[root@master ~]# /etc/init.d/keepalived status  #keepalived服务已经停止
keepalived is stopped
[root@master ~]#

手动FailOver

MHA的手动FailOver不在本文范围，详情理论可参考官方文档。

参数列表

请参考之前文章。

参考

https://dwj999.github.io/AWS-EC2搭建mha-vip-MySQL5-7.htm

结束语

云服务器上MySQL高可用，也可通过云负载均衡产品+MySQL复制来实现，但在数据安全性上，没有MHA+VIP+MySQL相对安全。