前言
随着云服务的迅速发展,越来越多企业选择将服务托管在云服务中,在数据库领域,AWS RDS、Aliyun RDS等都是不错的选择,默认已经做了高可用,基础运维,可以为企业节省不少的运维成本。由于RDS物理数据、Root权限等其他对象对用户不开放,难免在自动化运维中有些壁垒。本文将围绕在Aliyun ECS结合MHA做MySQL的高可用,ECS不支持VIP,但阿里的产品高可用虚拟IP(HaVip)结合keepliaved等第三方软件可间接实现ip的漂移,可满足我们的需求。
MHA简介
MHA是由日本MySQL专家youshimaton(现就职于Facebook公司)用Perl写的一套MySQL故障切换方案,以保障数据库的高可用性。在MySQL故障切换过程中,MHA能做到在0~30s之内实现主MySQL故障转移。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
高可用虚拟IP限制
- 每个VPC中最多只能同时存在5个vip对象
- 目前VPC中的网络通信不支持多播和广播,只支持单播;所以,如果用户是使用keepalived之类的第三方软件实现高可用,需要通过配置文件把通信方式改成单播;网上可以找到相应的方法
- 如果是使用keepalived之类的第三方软件,需要把信条消息的源IP改成ECS的私网IP(而不要用HaVip的私网IP进行心跳检查),不然很容易造成脑裂
- 当HaVip与EIP绑定后,进行公网通信时,持有HaVip的ECS实例应该通过HaVip的私网ip进行公网通信(而不是ECS自己的私网IP);因为这时EIP是映射在HaVip的私网IP上,而不是映射在ECS的私网IP上
- 类似的,当使用HaVip做自建SNAT网关的高可用时,SNAT实例上配置的SNAT规则中,source IP应该是havip的私网IP而不是自己的private IP
准备
操作系统:Linux Centos 6.5
master:192.168.16.80
slave: 192.168.16.81
vip: 192.168.16.82 (注:需要和HaVip的私网IP一致)
其中,使用两台机器,分别部署主库和从库,MHA默认部署在从库上,下方中的VIP通EIP,最终是要把EIP和HaVip绑定在一起
系统初始化修改
修改主机名
主库修改主机名
1 | [root@master ~]# cat /etc/sysconfig/network|grep HOSTNAME |
从库修改主机名
1 | [root@slave ~]# cat /etc/sysconfig/network|grep HOSTNAME |
设置防火墙
主库设置,允许从库访问
1 | [root@master ~]# /etc/init.d/iptables status|grep 192.168.16.81 |
其中,192.168.16.81是允许从库的访问规则
从库设置,允许主库访问
1 | [root@slave ~]# /etc/init.d/iptables status|grep 192.168.16.80 |
其中,192.168.16.80是允许主库的访问规则
关闭SELINUX
主库查看
1 | [root@master ~] |
从库查看
1 | [root@slave ~] |
建立SSH无密码登录
主库设置
修改服务器/etc/ssh/ssh_config文件,把参数GSSAPIAuthentication修改为no
修改服务器/etc/ssh/sshd_config文件,把参数PasswordAuthentication修改为yes,参数PermitRootLogin修改为yes,同时AllowUsers把root也添加上,重启ssh服务
修改服务器/etc/hosts.allow文件,允许从库连接1
2
3
4
5
6
7
8
9
10
11[root@master ~]# sed '/^#/d;/^$/d' /etc/ssh/ssh_config |grep GSSAPIAuthentication
GSSAPIAuthentication no
[root@master ~]# sed '/^#/d;/^$/d' /etc/ssh/sshd_config |grep -E 'PasswordAuthentication|PermitRootLogin'
PermitRootLogin yes
PasswordAuthentication yes
[root@master ~]# /etc/init.d/sshd restart
Stopping sshd: [ OK ]
Starting sshd: [ OK ]
[root@master ~]# cat /etc/hosts.allow |grep 192.168.16.81
sshd: 192.168.16.81
[root@master ~]#
使用命令ssh-keygen生成公钥,发送到从库,同时尝试在主库无密码形式登录从库,而且也要保证本机无密码登录本机1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46[root@master ~]
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
07:86:76:d6:4a:0f:ea:6e:88:00:eb:3d:5f:7e:08:7a root@master
The key's randomart image is:
+--[ RSA 2048]----+
| |
| . . |
| o B . |
|. . * = |
|.. . S o |
|o o . |
|......o.. |
| ..+.E+. . |
| +o... |
+-----------------+
[root@master ~]
The authenticity of host '192.168.16.81 (192.168.16.81)' can't be established.
RSA key fingerprint is 9f:1f:41:f6:2d:e9:20:83:30:be:cd:20:01:31:ea:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.16.81' (RSA) to the list of known hosts.
root@192.168.16.81's password:
Now try logging into the machine, with "ssh 'root@192.168.16.81'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@master ~]
The authenticity of host '192.168.16.80 (192.168.16.80)' can't be established.
RSA key fingerprint is f2:f8:7d:7d:59:3f:b9:3c:a0:e6:66:54:1f:79:e2:40.
Are you sure you want to continue connecting (yes/no)? yes
root@192.168.16.80's password:
Now try logging into the machine, with "ssh 'root@192.168.16.80'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@master ~]
测试登录1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17[root@master ~]# ssh root@192.168.16.81
Last login: Wed Oct 11 14:45:10 2017 from 192.168.16.80
Welcome to aliyun Elastic Compute Service!
[root@slave ~]# exit
logout
Connection to 192.168.16.81 closed.
[root@master ~]# ssh root@192.168.16.80
Last login: Wed Jan 18 16:25:59 2017
Welcome to aliyun Elastic Compute Service!
[root@master ~]# exit
logout
Connection to 192.168.16.80 closed.
[root@master ~]#
从库设置
和主库保持一致
修改服务器/etc/ssh/ssh_config文件,把参数GSSAPIAuthentication修改为no
修改服务器/etc/ssh/sshd_config文件,把参数PasswordAuthentication修改为yes,参数PermitRootLogin修改为yes,同时AllowUsers把root也添加上,重启ssh服务
修改服务器/etc/hosts.allow文件,允许从库连接
1 | [root@slave ~] |
使用命令ssh-keygen生成公钥,发送到从库,同时尝试在主库无密码形式登录从库,而且也要保证本机无密码登录本机1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46[root@slave ~]
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
ae:04:32:4d:3e:d3:b7:5d:4e:d2:50:7a:4c:44:d7:0a root@slave
The key's randomart image is:
+--[ RSA 2048]----+
| o= .. |
| =E. .|
| . o o. . |
| + . + . |
| o * . S . + |
| o + o o = |
| . o . . |
| . . |
| . |
+-----------------+
[root@slave ~]
The authenticity of host '192.168.16.80 (192.168.16.80)' can't be established.
RSA key fingerprint is f2:f8:7d:7d:59:3f:b9:3c:a0:e6:66:54:1f:79:e2:40.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.16.80' (RSA) to the list of known hosts.
root@192.168.16.80's password:
Now try logging into the machine, with "ssh 'root@192.168.16.80'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@slave ~]
The authenticity of host '192.168.16.81 (192.168.16.81)' can't be established.
RSA key fingerprint is 9f:1f:41:f6:2d:e9:20:83:30:be:cd:20:01:31:ea:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.16.81' (RSA) to the list of known hosts.
root@192.168.16.81's password:
Now try logging into the machine, with "ssh 'root@192.168.16.81'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@slave ~]
测试登录1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17[root@slave ~]# ssh root@192.168.16.80
Last login: Wed Oct 11 14:48:14 2017 from 192.168.16.80
Welcome to aliyun Elastic Compute Service!
[root@master ~]# exit
logout
Connection to 192.168.16.80 closed.
[root@slave ~]# ssh root@192.168.16.81
Last login: Wed Oct 11 14:48:10 2017 from 192.168.16.80
Welcome to aliyun Elastic Compute Service!
[root@slave ~]# exit
logout
Connection to 192.168.16.81 closed.
[root@slave ~]#
MySQL安装
使用的是自己打包的RPM包,分别登录主库和从库,直接rpm -ivh percona-server-5.7.18-15.x86_64.rpm 即可,也可使用其他方式安装MySQL1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17[root@master ~]# mv /home/duanwenjie/percona-server-5.7.18-15.x86_64.rpm /usr/src/
[root@master ~]# rpm -ivh /usr/src/percona-server-5.7.18-15.x86_64.rpm
Preparing... ########################################### [100%]
Group 'mail' not found. Creating the user mailbox file with 0600 mode.
1:percona-server ########################################### [100%]
error reading information on service /etc/rc.d/init.d/mysqld: No such file or directory
MySQL (Percona Server) PID file could not be found![FAILED]
Starting MySQL (Percona Server)...[ OK ]
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
Warning: Since password will be sent to server in plain text, use ssl connection to ensure password safety.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@master ~]#
查看是否安装成功,分别登录主库和从库1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26[root@master ~]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)] 15:03:27 > select version();
+---------------+
| version() |
+---------------+
| 5.7.18-15-log |
+---------------+
1 row in set (0.00 sec)
MySQL [(none)] 15:03:31 > exit
Bye
[root@master ~]#
MySQL互为主备
重要参数配置
主库必须包含以下参数1
2
3
4
5
6
7
8
9
10[root@master ~]# cat /etc/my.cnf
[mysqld]
server-id = 1
autocommit = 1
auto_increment_increment = 1
auto_increment_offset = 2
log_bin = mysql-bin
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates
从库必须包含以下参数,注意slave需要动态设置read_only=11
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28[root@slave ~]# cat /etc/my.cnf
server-id = 2
autocommit = 1
auto_increment_increment = 2
auto_increment_offset = 2
log_bin = mysql-bin
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates
relay_log_purge=0
[root@slave ~]# mysql -uroot -p -e "show variables like 'read_only'"
Enter password:
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
[root@slave ~]# mysql -uroot -p -e "set global read_only=1"
Enter password:
[root@slave ~]# mysql -uroot -p -e "show variables like 'read_only'"
Enter password:
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | ON |
+---------------+-------+
[root@slave ~]#
复制帐号建立
由于MySQL版本使用的是5.7,创建用户的方法以之前有些不同
主库创建
1 | [root@master ~]# mysql -uroot -p |
从库创建
1 | [root@slave ~]# mysql -uroot -p |
复制帐号验证
主库登录从库1
2
3
4
5
6
7
8[root@master ~]# mysql -uslave01 -pslave123456 -h192.168.16.81 -e "select version()"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+
| version() |
+---------------+
| 5.7.18-15-log |
+---------------+
[root@master ~]#
从库登录主库1
2
3
4
5
6
7
8[root@slave ~]# mysql -uslave01 -pslave123456 -h192.168.16.80 -e "select version()"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+
| version() |
+---------------+
| 5.7.18-15-log |
+---------------+
[root@slave ~]#
复制关系配置
主库配置
登录主库,设置复制关系,来源于从库的复制1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84[root@master ~]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)] 15:14:07 > change master to master_host='192.168.16.81',master_user='slave01',master_password='slave123456',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.03 sec)
MySQL [(none)] 15:14:46 > start slave;
Query OK, 0 rows affected (0.04 sec)
MySQL [(none)] 15:14:49 > show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.16.81
Master_User: slave01
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 812
Relay_Log_File: master-relay-bin.000002
Relay_Log_Pos: 1025
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 812
Relay_Log_Space: 1233
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: e1ec82e8-ae51-11e7-8532-00163e128057
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: e1ec82e8-ae51-11e7-8532-00163e128057:1-3
Executed_Gtid_Set: df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-3,
e1ec82e8-ae51-11e7-8532-00163e128057:1-3
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
MySQL [(none)] 15:14:56 >
从库配置
登录从库,设置复制关系,来源于主库的复制1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84[root@slave ~]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)] 15:15:42 > change master to master_host='192.168.16.80',master_user='slave01',master_password='slave123456',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.05 sec)
MySQL [(none)] 15:15:53 > start slave;
Query OK, 0 rows affected (0.04 sec)
MySQL [(none)] 15:15:55 > show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.16.80
Master_User: slave01
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 1470
Relay_Log_File: slave-relay-bin.000002
Relay_Log_Pos: 1072
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1470
Relay_Log_Space: 1279
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: df5051fc-ae51-11e7-85ee-00163e0f0e4a
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-3
Executed_Gtid_Set: df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-3,
e1ec82e8-ae51-11e7-8532-00163e128057:1-3
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
MySQL [(none)] 15:16:00 >
复制测试
主库登录测试
登录主库,插入测试数据1
2
3
4
5
6
7
8
9
10
11
12
13MySQL [test11] 15:17:10 > create table if not exists t11(id int unsigned not null auto_increment,name varchar(20),primary key(`id`));
Query OK, 0 rows affected (0.04 sec)
MySQL [test11] 15:17:20 > insert into t11 values(null,'master11');
Query OK, 1 row affected (0.00 sec)
MySQL [test11] 15:17:28 > select * from t11;
+----+----------+
| id | name |
+----+----------+
| 1 | master11 |
+----+----------+
1 row in set (0.00 sec)
登录从库,查看测试数据,此时数据已经复制过来1
2
3
4
5
6
7
8[root@slave ~]# mysql -uroot -p test11 -e "select * from t11"
Enter password:
+----+----------+
| id | name |
+----+----------+
| 1 | master11 |
+----+----------+
[root@slave ~]#
从库登录测试
登录从库,插入测试数据1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19MySQL [(none)] 15:18:56 > create database if not exists test11;use test11;
Query OK, 1 row affected, 1 warning (0.00 sec)
Database changed
MySQL [test11] 15:19:20 > create table if not exists t22(id int unsigned not null auto_increment,name varchar(20),primary key(`id`));
Query OK, 0 rows affected (0.05 sec)
MySQL [test11] 15:19:38 > insert into t22 values(null,'slave11');
Query OK, 1 row affected (0.00 sec)
MySQL [test11] 15:19:43 > select * from t22;
+----+---------+
| id | name |
+----+---------+
| 2 | slave11 |
+----+---------+
1 row in set (0.00 sec)
MySQL [test11] 15:19:51 >
登录主库,查看测试数据,此时数据已经复制过来1
2
3
4
5
6
7
8[root@master ~]# mysql -uroot -p test11 -e "select * from t22"
Enter password:
+----+---------+
| id | name |
+----+---------+
| 2 | slave11 |
+----+---------+
[root@master ~]#
MHA帐号创建
主库创建
登录主库,创建允许主库和从库连接的MHA用户信息,再分别尝试使用MHA用户登录(省略)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29[root@master ~]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 29
Server version: 5.7.18-15-log PLD/Linux Distribution Percona Server RPM
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)] 15:23:16 > create user 'mha01'@'192.168.16.81' identified by 'mha123456';
Query OK, 0 rows affected (0.00 sec)
MySQL [(none)] 15:24:02 > grant all privileges on *.* to 'mha01'@'192.168.16.81';
Query OK, 0 rows affected (0.00 sec)
MySQL [(none)] 15:24:27 > create user 'mha01'@'192.168.16.80' identified by 'mha123456';
Query OK, 0 rows affected (0.01 sec)
MySQL [(none)] 15:24:35 > grant all privileges on *.* to 'mha01'@'192.168.16.80';
Query OK, 0 rows affected (0.02 sec)
MySQL [(none)] 15:24:41 > flush privileges;
Query OK, 0 rows affected (0.00 sec)
从库创建
登录从库,由于复制已经把在主库创建的MHA用户信息复制了过来,尝试登录即可(省略)
MHA安装
安装依赖
分别在主库和从库上安装,执行命令:
yum -y install gcc gcc-c++ make openssl-devel perl perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles perl-Time-HiRes perl-Module-Install.noarch mailx jwhois cpan
主库安装mha4mysql-node
MHA管理端安装在从库上,主库上只需要mha4mysql-node即可,由于MySQL使用5.7版本,MHA也使用最新的0.57版本,采用编译安装方式。下载地址:https://mega.nz/#F!G4oRjARB!SWzFS59bUv9VrKwdAeIGVw1
2
3
4
5
6
7
8
9
10
11[root@master ~]# cd /usr/src/
[root@master src]# tar -zxf mha4mysql-node-0.57.tar.gz
[root@master mha4mysql-node-0.57]# perl Makefile.PL #编译过程省略
[root@master mha4mysql-node-0.57]# make
[root@master mha4mysql-node-0.57]# make install
[root@master mha4mysql-node-0.57]# ll /usr/local/bin/ -t #查看安装,有以下文件则mha4mysql-node安装成功
total 1760
-r-xr-xr-x 1 root root 16381 Jul 21 09:41 apply_diff_relay_logs
-r-xr-xr-x 1 root root 8261 Jul 21 09:41 purge_relay_logs
-r-xr-xr-x 1 root root 7525 Jul 21 09:41 save_binary_logs
-r-xr-xr-x 1 root root 4807 Jul 21 09:41 filter_mysqlbinlog
从库安装mha4mysql-manager和mha4mysql-node
MHA管理端安装在从库,从库需要安装manager端和node端1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29[root@slave ~]# cd /usr/src/
[root@slave src]# tar -zxf mha4mysql-manager-0.57.tar.gz
[root@slave src]# cd mha4mysql-manager-0.57
[root@slave mha4mysql-manager-0.57]# perl Makefile.PL #编译过程省略
[root@slave mha4mysql-manager-0.57]# make
[root@slave mha4mysql-manager-0.57]# make install
[root@slave mha4mysql-manager-0.57]# ll /usr/local/bin/ -t #查看安装,有以下文件则mha4mysql-manager安装成功
total 40
-r-xr-xr-x 1 root root 1779 Oct 11 15:45 masterha_check_ssh
-r-xr-xr-x 1 root root 2517 Oct 11 15:45 masterha_manager
-r-xr-xr-x 1 root root 2373 Oct 11 15:45 masterha_master_switch
-r-xr-xr-x 1 root root 5171 Oct 11 15:45 masterha_secondary_check
-r-xr-xr-x 1 root root 1995 Oct 11 15:45 masterha_check_repl
-r-xr-xr-x 1 root root 1865 Oct 11 15:45 masterha_check_status
-r-xr-xr-x 1 root root 3201 Oct 11 15:45 masterha_conf_host
-r-xr-xr-x 1 root root 2165 Oct 11 15:45 masterha_master_monitor
-r-xr-xr-x 1 root root 1739 Oct 11 15:45 masterha_stop
[root@slave mha4mysql-manager-0.57]# cd ..
[root@slave src]# tar -zxf mha4mysql-node-0.57.tar.gz
[root@slave src]# cd mha4mysql-node-0.57
[root@slave mha4mysql-node-0.57]# perl Makefile.PL #编译过程省略
[root@slave mha4mysql-node-0.57]# make
[root@slave mha4mysql-node-0.57]# make install
[root@slave mha4mysql-node-0.57]# ll /usr/local/bin/ -t #查看安装,有以下文件则mha4mysql-node安装成功
total 84
-r-xr-xr-x 1 root root 16381 Oct 11 15:47 apply_diff_relay_logs
-r-xr-xr-x 1 root root 4807 Oct 11 15:47 filter_mysqlbinlog
-r-xr-xr-x 1 root root 8261 Oct 11 15:47 purge_relay_logs
-r-xr-xr-x 1 root root 7525 Oct 11 15:47 save_binary_logs
MHA目录结构说明
MHAManager
mhamanager工具包主要包括以下工具1
2
3
4
5
6
7
8
9
10
11[root@slave ~]# ll /usr/local/bin/
总用量 40
-r-xr-xr-x 1 root root 1995 Oct 11 15:45 masterha_check_repl #检查MySQL复制情况
-r-xr-xr-x 1 root root 1779 Oct 11 15:45 masterha_check_ssh #检查MHA的SSH配置情况
-r-xr-xr-x 1 root root 1865 Oct 11 15:45 masterha_check_status #检测当前MHA运行状态
-r-xr-xr-x 1 root root 3201 Oct 11 15:45 masterha_conf_host #添加或删除配置的server信息
-r-xr-xr-x 1 root root 2517 Oct 11 15:45 masterha_manager #启动MHA
-r-xr-xr-x 1 root root 2165 Oct 11 15:45 masterha_master_monitor #检测Master是否宕机
-r-xr-xr-x 1 root root 2373 Oct 11 15:45 masterha_master_switch #控制故障转移,自动或者手动
-r-xr-xr-x 1 root root 5172 Oct 11 15:45 masterha_secondary_check #通过其他路由检测Master是否真的宕机
-r-xr-xr-x 1 root root 1739 Oct 11 15:45 masterha_stop #停止MHA
MHANode
mhanode工具包主要包括以下工具1
2
3
4
5
6[root@slave ~]# ll /usr/local/bin/ -t
total 84
-r-xr-xr-x 1 root root 16371 Oct 11 15:47 apply_diff_relay_logs #识别差异日志的中继日志,并将其差异事件应用于其他Slave
-r-xr-xr-x 1 root root 4807 Oct 11 15:47 filter_mysqlbinlog #去除不必要的Rollback事件
-r-xr-xr-x 1 root root 8263 Oct 11 15:47 purge_relay_logs #删除无用的Relay log,避免延时
-r-xr-xr-x 1 root root 7525 Oct 11 15:47 save_binary_logs #保存和复制down掉的主服务器二进制日志
自定义扩展脚本说明
secondary_check_script #通过多条网络路由检测master的可用性
master_ip_failover_script #自动failover时候的切换脚本,可将vip信息写入此脚本中
shutdown_script #强制关闭master节点执行脚本
report_script #发送报告
init_conf_load_script #加载初始配置参数,如不想在配置中写明文密码
master_ip_online_change_script #手动failover时候的切换脚本
keepalived安装
MHA作为MySQL的HA软件,借助脚本或者第三方软件如keepalived,可实现自动的failover
本次环境使用yum安装keepalived,主库和从库分别执行安装: yum -y install keepalived
keepalived的配置文件,默认在主库上绑定EIP,priority要比备库高,同时为了避免脑裂,两个state同时设置为BACKUP,由于HaVip不支持组播和广播通讯,因此需要将keepalived的心跳方式设置为单播,添加unicast_src_ip和unicast_peer
配置文件
主库配置
1 | [root@master ~]# cat /etc/keepalived/keepalived.conf |
从库配置
1 | [root@slave ~]# cat /etc/keepalived/keepalived.conf |
EIP漂移测试
分别启动keepalived服务,查看默认EIP在哪台机器上,同时关闭EIP所在机器的keepalived服务,查看EIP是否漂移,最后恢复原状
主库启动keepalived服务
1 | [root@master keepalived]# /etc/init.d/keepalived start |
从库启动keepalived服务
1 | [root@slave keepalived] |
查看默认EIP在主库上
1 | [root@master ~]# ifconfig |
关闭主库的keepalived服务
关闭主库的keepalived服务后,EIP消失1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20[root@master ~]# /etc/init.d/keepalived stop
Stopping keepalived: [ OK ]
[root@master ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:16:3E:0F:0E:4A
inet addr:192.168.16.80 Bcast:192.168.31.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:172670 errors:0 dropped:0 overruns:0 frame:0
TX packets:71803 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:200704346 (191.4 MiB) TX bytes:12569281 (11.9 MiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:7976 errors:0 dropped:0 overruns:0 frame:0
TX packets:7976 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:512770 (500.7 KiB) TX bytes:512770 (500.7 KiB)
[root@master ~]#
查看EIP是否漂移到从库上
此时EIP已经漂移到从库上1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[root@slave ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:16:3E:12:80:57
inet addr:192.168.16.81 Bcast:192.168.31.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:164539 errors:0 dropped:0 overruns:0 frame:0
TX packets:77532 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:202145589 (192.7 MiB) TX bytes:12091265 (11.5 MiB)
eth0:havip Link encap:Ethernet HWaddr 00:16:3E:12:80:57
inet addr:192.168.16.82 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:7961 errors:0 dropped:0 overruns:0 frame:0
TX packets:7961 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:508050 (496.1 KiB) TX bytes:508050 (496.1 KiB)
[root@slave ~]#
恢复原状
主库启动keepalived服务,查看EIP已经又漂移回来1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24[root@master ~]# /etc/init.d/keepalived start
Starting keepalived: [ OK ]
[root@master ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:16:3E:0F:0E:4A
inet addr:192.168.16.80 Bcast:192.168.31.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:172863 errors:0 dropped:0 overruns:0 frame:0
TX packets:71978 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:200717148 (191.4 MiB) TX bytes:12627987 (12.0 MiB)
eth0:havip Link encap:Ethernet HWaddr 00:16:3E:0F:0E:4A
inet addr:192.168.16.82 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:8039 errors:0 dropped:0 overruns:0 frame:0
TX packets:8039 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:516676 (504.5 KiB) TX bytes:516676 (504.5 KiB)
[root@master ~]#
MHA配置
MHAManager
创建配置目录/etc/masterha/,同时创建一个项目上的配置文件,仅配置自动FailOver部分,脚本暂时不定义,下文会有MHA引入keepalived。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/hwdata/data/percona
password=mha123456
user=mha01
ping_interval=1
remote_workdir=/tmp
repl_password=slave123456
repl_user=slave01
ssh_user=root
#master_ip_failover_script=/usr/local/bin/master_ip_failover
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change
#report_script=/usr/local/bin/send_report
#shutdown_script=
secondary_check_script = masterha_secondary_check -s 192.168.16.80 -s 192.168.16.81 --user=root hostname=192.168.16.80 --master_ip=192.168.16.80 --master_port=3306
[server1]
hostname=192.168.16.80
port=3306
[server2]
hostname=192.168.16.81
port=3306
candidate_master=1
测试MHAManager SSH
1 | [root@slave masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf |
测试MHAManager 复制
1 | [root@slave masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf |
MHA引入keepalived
把keepalived服务引入MHA,需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。
master_ip_failover脚本
编辑脚本/usr/local/bin/master_ip_failover,修改后如下,这里完整的脚本1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89[root@slave app1]# cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.16.82';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@slave app1]#
添加权限
chmod +x /usr/local/bin/master_ip_failover
send_report脚本
1 | [root@slave masterha]# cat /usr/local/bin/send_report |
添加权限
chmod +x /usr/local/bin/send_report
MHAManager修改注释
1 | [root@slave app1]# cat /etc/masterha/app1.cnf |grep master_ip_failover_script |
HaVip绑定EIP
keepalived服务已经配置完成,VIP也可以自动漂移后,但是其还不能在内网中通讯,需要将EIP映射到HaVip中,创建高可用虚拟IP,把master和slave两个实例的绑定在一起,具体步骤省略,见下图
MHA启动
1 | [ ] |
查看启动日志
1 | [root@slave app1]# cat /var/log/masterha/app1/manager.log |
模拟主库故障(自动FailOver)
确认MHAManager进程状态
1 | [root@slave app1]# ps aux|grep master |
停止主库MySQL服务
1 | [root@master ~]# ifconfig #停止前VIP存在于主库上 |
查看FailOver日志
登录从库,查看MHAManager日志1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181[root@slave app1]# tail -f /var/log/masterha/app1/manager.log
IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
Checking the Status of the script.. OK
Thu Oct 12 09:26:57 2017 - [info] OK.
Thu Oct 12 09:26:57 2017 - [warning] shutdown_script is not defined.
Thu Oct 12 09:26:57 2017 - [info] Set master ping interval 1 seconds.
Thu Oct 12 09:26:57 2017 - [info] Set secondary check script: masterha_secondary_check -s 192.168.16.80 -s 192.168.16.81 --user=root hostname=192.168.16.80 --master_ip=192.168.16.80 --master_port=3306
Thu Oct 12 09:26:57 2017 - [info] Starting ping health check on 192.168.16.80(192.168.16.80:3306)..
Thu Oct 12 09:26:57 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. #开始监听服务状态
#监控到master故障后,MHA开始做自动FailOver
Thu Oct 12 09:27:54 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Thu Oct 12 09:27:54 2017 - [info] Executing secondary network check script: masterha_secondary_check -s 192.168.16.80 -s 192.168.16.81 --user=root hostname=192.168.16.80 --master_ip=192.168.16.80 --master_port=3306 --user=root --master_host=192.168.16.80 --master_ip=192.168.16.80 --master_port=3306 --master_user=mha01 --master_password=mha123456 --ping_type=SELECT
Thu Oct 12 09:27:54 2017 - [info] Executing SSH check script: exit 0
Thu Oct 12 09:27:54 2017 - [info] HealthCheck: SSH to 192.168.16.80 is reachable.
Monitoring server 192.168.16.80 is reachable, Master is not reachable from 192.168.16.80. OK.
Monitoring server 192.168.16.81 is reachable, Master is not reachable from 192.168.16.81. OK.
Thu Oct 12 09:27:54 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Thu Oct 12 09:27:55 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Oct 12 09:27:55 2017 - [warning] Connection failed 2 time(s)..
Thu Oct 12 09:27:56 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Oct 12 09:27:56 2017 - [warning] Connection failed 3 time(s)..
Thu Oct 12 09:27:57 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Oct 12 09:27:57 2017 - [warning] Connection failed 4 time(s)..
Thu Oct 12 09:27:57 2017 - [warning] Master is not reachable from health checker!
Thu Oct 12 09:27:57 2017 - [warning] Master 192.168.16.80(192.168.16.80:3306) is not reachable!
Thu Oct 12 09:27:57 2017 - [warning] SSH is reachable.
Thu Oct 12 09:27:57 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Thu Oct 12 09:27:57 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Oct 12 09:27:57 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Oct 12 09:27:57 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Oct 12 09:27:58 2017 - [info] GTID failover mode = 1
Thu Oct 12 09:27:58 2017 - [info] Dead Servers:
Thu Oct 12 09:27:58 2017 - [info] 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:58 2017 - [info] Alive Servers:
Thu Oct 12 09:27:58 2017 - [info] 192.168.16.81(192.168.16.81:3306)
Thu Oct 12 09:27:58 2017 - [info] Alive Slaves:
Thu Oct 12 09:27:58 2017 - [info] 192.168.16.81(192.168.16.81:3306) Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:58 2017 - [info] GTID ON
Thu Oct 12 09:27:58 2017 - [info] Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:58 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:58 2017 - [info] Checking slave configurations..
Thu Oct 12 09:27:58 2017 - [info] Checking replication filtering settings..
Thu Oct 12 09:27:58 2017 - [info] Replication filtering check ok.
Thu Oct 12 09:27:58 2017 - [info] Master is down!
Thu Oct 12 09:27:58 2017 - [info] Terminating monitoring script.
Thu Oct 12 09:27:58 2017 - [info] Got exit code 20 (Master dead).
Thu Oct 12 09:27:58 2017 - [info] MHA::MasterFailover version 0.57.
Thu Oct 12 09:27:58 2017 - [info] Starting master failover.
Thu Oct 12 09:27:58 2017 - [info]
Thu Oct 12 09:27:58 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Oct 12 09:27:58 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] GTID failover mode = 1
Thu Oct 12 09:27:59 2017 - [info] Dead Servers:
Thu Oct 12 09:27:59 2017 - [info] 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info] Checking master reachability via MySQL(double check)...
Thu Oct 12 09:27:59 2017 - [info] ok.
Thu Oct 12 09:27:59 2017 - [info] Alive Servers:
Thu Oct 12 09:27:59 2017 - [info] 192.168.16.81(192.168.16.81:3306)
Thu Oct 12 09:27:59 2017 - [info] Alive Slaves:
Thu Oct 12 09:27:59 2017 - [info] 192.168.16.81(192.168.16.81:3306) Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info] GTID ON
Thu Oct 12 09:27:59 2017 - [info] Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info] Starting GTID based failover.
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Oct 12 09:27:59 2017 - [info] Executing master IP deactivation script:
Thu Oct 12 09:27:59 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.16.80 --orig_master_ip=192.168.16.80 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
Disabling the VIP on old master: 192.168.16.80
Thu Oct 12 09:27:59 2017 - [info] done.
Thu Oct 12 09:27:59 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Oct 12 09:27:59 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 3: Master Recovery Phase..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000005:234
Thu Oct 12 09:27:59 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Oct 12 09:27:59 2017 - [info] 192.168.16.81(192.168.16.81:3306) Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info] GTID ON
Thu Oct 12 09:27:59 2017 - [info] Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000005:234
Thu Oct 12 09:27:59 2017 - [info] Oldest slaves:
Thu Oct 12 09:27:59 2017 - [info] 192.168.16.81(192.168.16.81:3306) Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info] GTID ON
Thu Oct 12 09:27:59 2017 - [info] Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 3.3: Determining New Master Phase..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] Searching new master from slaves..
Thu Oct 12 09:27:59 2017 - [info] Candidate masters from the configuration file:
Thu Oct 12 09:27:59 2017 - [info] 192.168.16.81(192.168.16.81:3306) Version=5.7.18-15-log (oldest major version between slaves) log-bin:enabled
Thu Oct 12 09:27:59 2017 - [info] GTID ON
Thu Oct 12 09:27:59 2017 - [info] Replicating from 192.168.16.80(192.168.16.80:3306)
Thu Oct 12 09:27:59 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Oct 12 09:27:59 2017 - [info] Non-candidate masters:
Thu Oct 12 09:27:59 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Thu Oct 12 09:27:59 2017 - [info] New master is 192.168.16.81(192.168.16.81:3306)
Thu Oct 12 09:27:59 2017 - [info] Starting master failover..
Thu Oct 12 09:27:59 2017 - [info]
From:
192.168.16.80(192.168.16.80:3306) (current master)
+--192.168.16.81(192.168.16.81:3306)
To:
192.168.16.81(192.168.16.81:3306) (new master)
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 3.3: New Master Recovery Phase..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] Waiting all logs to be applied..
Thu Oct 12 09:27:59 2017 - [info] done.
Thu Oct 12 09:27:59 2017 - [info] Getting new master's binlog name and position..
Thu Oct 12 09:27:59 2017 - [info] mysql-bin.000002:4876
Thu Oct 12 09:27:59 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.16.81', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='slave01', MASTER_PASSWORD='xxx';
Thu Oct 12 09:27:59 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 4876, df5051fc-ae51-11e7-85ee-00163e0f0e4a:1-15,
e1ec82e8-ae51-11e7-8532-00163e128057:1-6
Thu Oct 12 09:27:59 2017 - [info] Executing master IP activate script:
Thu Oct 12 09:27:59 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.16.80 --orig_master_ip=192.168.16.80 --orig_master_port=3306 --new_master_host=192.168.16.81 --new_master_ip=192.168.16.81 --new_master_port=3306 --new_master_user='mha01' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
Enabling the VIP - 192.168.16.200 on the new master - 192.168.16.81
Thu Oct 12 09:27:59 2017 - [info] OK.
Thu Oct 12 09:27:59 2017 - [info] Setting read_only=0 on 192.168.16.81(192.168.16.81:3306)..
Thu Oct 12 09:27:59 2017 - [info] ok.
Thu Oct 12 09:27:59 2017 - [info] ** Finished master recovery successfully.
Thu Oct 12 09:27:59 2017 - [info] * Phase 3: Master Recovery Phase completed.
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 4: Slaves Recovery Phase..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] All new slave servers recovered successfully.
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] * Phase 5: New master cleanup phase..
Thu Oct 12 09:27:59 2017 - [info]
Thu Oct 12 09:27:59 2017 - [info] Resetting slave info on the new master..
Thu Oct 12 09:27:59 2017 - [info] 192.168.16.81: Resetting slave info succeeded.
Thu Oct 12 09:27:59 2017 - [info] Master failover to 192.168.16.81(192.168.16.81:3306) completed successfully.
Thu Oct 12 09:27:59 2017 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.16.80(192.168.16.80:3306) to 192.168.16.81(192.168.16.81:3306) succeeded
Master 192.168.16.80(192.168.16.80:3306) is down!
Check MHA Manager logs at slave:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.16.80(192.168.16.80:3306)
Selected 192.168.16.81(192.168.16.81:3306) as a new master.
192.168.16.81(192.168.16.81:3306): OK: Applying all logs succeeded.
192.168.16.81(192.168.16.81:3306): OK: Activated master IP address.
192.168.16.81(192.168.16.81:3306): Resetting slave info succeeded.
Master failover to 192.168.16.81(192.168.16.81:3306) completed successfully.
Thu Oct 12 09:27:59 2017 - [info] Sending mail..
Option new_slave_hosts requires an argument
Unknown option: conf
tail: /var/log/masterha/app1/manager.log: file truncated
^C
[1]+ Done nohup masterha_manager --conf=/etc/masterha/app1.cnf (wd: /etc/masterha)
(wd now: /var/log/masterha/app1)
[root@slave app1]#
注意:
由于阿里云的ECS默认不允许发送邮件,它们把25端口已经封掉,如果需要开通25端口,主联系阿里云技术支持。本次切换日志在最后没有发送邮件,是由于这个原因导致的。
日志解析过程
1 | 启动前的准备工作 |
确认FailOver状态
登录从库,查看VIP是否漂移过程1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[root@slave app1]# ifconfig #VIP已经漂移过程
eth0 Link encap:Ethernet HWaddr 00:16:3E:12:80:57
inet addr:192.168.16.81 Bcast:192.168.31.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:295022 errors:0 dropped:0 overruns:0 frame:0
TX packets:135929 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:212253710 (202.4 MiB) TX bytes:52246361 (49.8 MiB)
eth0:havip Link encap:Ethernet HWaddr 00:16:3E:12:80:57
inet addr:192.168.16.82 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:52973 errors:0 dropped:0 overruns:0 frame:0
TX packets:52973 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3597938 (3.4 MiB) TX bytes:3597938 (3.4 MiB)
[root@slave app1]#
登录主库,查看VIP是否存在,keepalived服务状态1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20[root@master ~]# ifconfig #VIP已经不存在
eth0 Link encap:Ethernet HWaddr 00:16:3E:0F:0E:4A
inet addr:192.168.16.80 Bcast:192.168.31.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:284277 errors:0 dropped:0 overruns:0 frame:0
TX packets:229786 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:211190989 (201.4 MiB) TX bytes:60010868 (57.2 MiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:51564 errors:0 dropped:0 overruns:0 frame:0
TX packets:51564 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3217146 (3.0 MiB) TX bytes:3217146 (3.0 MiB)
[root@master ~]# /etc/init.d/keepalived status #keepalived服务已经停止
keepalived is stopped
[root@master ~]#
手动FailOver
MHA的手动FailOver不在本文范围,详情理论可参考官方文档。
参数列表
请参考之前文章。
参考
https://dwj999.github.io/AWS-EC2搭建mha-vip-MySQL5-7.htm
结束语
云服务器上MySQL高可用,也可通过云负载均衡产品+MySQL复制来实现,但在数据安全性上,没有MHA+VIP+MySQL相对安全。