OpenStack HA集群3-Pace

发布时间:2019-09-19 08:06:40编辑:auto阅读(1712)

    节点间主机名必须能解析

    [root@controller1 ~]# cat /etc/hosts

    192.168.17.149  controller1

    192.168.17.141  controller2

    192.168.17.166  controller3

    192.168.17.111  demo.open-stack.cn

    各节点间要互信,无密码能登录

    [root@controller1 ~]# ssh-keygen -t rsa

    Generating public/private rsa key pair.

    Enter file in which to save the key (/root/.ssh/id_rsa):

    Enter passphrase (empty for no passphrase):

    Enter same passphrase again:

    Your identification has been saved in /root/.ssh/id_rsa.

    Your public key has been saved in /root/.ssh/id_rsa.pub.

    The key fingerprint is:

    20:79:d4:a4:9f:8b:75:cf:12:58:f4:47:a4:c1:29:f3 root@controller1

    The key's randomart p_w_picpath is:

    +--[ RSA 2048]----+

    |      .o. ...oo  |

    |     o ...o.o+   |

    |    o +   .+o .  |

    |     o o +  E.   |

    |        S o      |

    |       o o +     |

    |      . . . o    |

    |           .     |

    |                 |

    +-----------------+

    [root@controller1 ~]# ssh-copy-id controller2

    [root@controller1 ~]# ssh-copy-id controller3

    配置YUM源

    # vim /etc/yum.repos.d/ha-clustering.repo

    [network_ha-clustering_Stable]

    name=Stable High Availability/Clustering packages (CentOS-7)

    type=rpm-md

    baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/

    gpgcheck=0

    gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/repodata/repomd.xml.key

    enabled=1

    这个YUM源可能会冲突,先enabled=0,如果剩下一个crmsh包,再enabled=1打开后安装

    Corosync下载地址,目前最新版本2.4.2

    http://build.clusterlabs.org/corosync/releases/

    http://build.clusterlabs.org/corosync/releases/corosync-2.4.2.tar.gz

    [root@controller1 ~]# ansible controller -m copy -a "src=/etc/yum.repos.d/ha-cluster.repo dest=/etc/yum.repos.d/"

    安装软件包

    # yum install pacemaker pcs resource-agents -y cifs-utils quota psmisc corosync fence-agents-all lvm2 resource-agents

    #  yum install crmsh  -y

    启动pcsd,并确认启动正常

    # systemctl enable pcsd

    # systemctl enable corosync

    # systemctl start pcsd

    # systemctl status pcsd

    [root@controller2 ~]# pacemakerd -$

    Pacemaker 1.1.15-11.el7_3.2

    Written by Andrew Beekhof

    [root@controller1 ~]# ansible controller -m command -a "pacemakerd -$"

    修改hacluster密码

    【all】# echo zoomtech | passwd --stdin hacluster

    [root@controller1 ~]# ansible controller -m command -a "echo zoomtech | passwd --stdin hacluster"

    # passwd hacluster

    编辑corosync.conf

    [root@controller3 ~]# vim /etc/corosync/corosync.conf

    totem {

            version: 2

            secauth: off

            cluster_name: openstack-cluster

            transport: udpu

    }

    nodelist {

            node {

                    ring0_addr: controller1

                    nodeid: 1

            }

            node {

                    ring0_addr: controller2

                    nodeid: 2

            }

            node {

                    ring0_addr: controller3

                    nodeid: 3

            }

    }

    logging {

            to_logfile: yes

            logfile: /var/log/cluster/corosync.log

            to_syslog: yes

    }

    quorum {

            provider: corosync_votequorum

    }

    [root@controller1 ~]# scp /etc/corosync/corosync.conf controller2:/etc/corosync/

    [root@controller1 ~]# scp /etc/corosync/corosync.conf controller3:/etc/corosync/

    [root@controller1 corosync]# ansible controller -m copy -a "src=corosync.conf dest=/etc/corosync"

    创建集群

    使用pcs设置集群身份认证

    [root@controller1 ~]# pcs cluster auth controller1 controller2 controller3 -u hacluster -p zoomtech --force

    controller3: Authorized

    controller2: Authorized

    controller1: Authorized

    现在我们创建一个集群并添加一些节点。注意,这个名字不能超过15个字符

    [root@controller1 ~]# pcs cluster setup --force --name openstack-cluster controller1 controller2 controller3

    Destroying cluster on nodes: controller1, controller2, controller3...

    controller3: Stopping Cluster (pacemaker)...

    controller2: Stopping Cluster (pacemaker)...

    controller1: Stopping Cluster (pacemaker)...

    controller2: Successfully destroyed cluster

    controller1: Successfully destroyed cluster

    controller3: Successfully destroyed cluster

    Sending cluster config files to the nodes...

    controller1: Succeeded

    controller2: Succeeded

    controller3: Succeeded

    Synchronizing pcsd certificates on nodes controller1, controller2, controller3...

    controller3: Success

    controller2: Success

    controller1: Success

    Restarting pcsd on the nodes in order to reload the certificates...

    controller3: Success

    controller2: Success

    controller1: Success

    启动集群

    [root@controller1 ~]# pcs cluster enable --all

    controller1: Cluster Enabled

    controller2: Cluster Enabled

    controller3: Cluster Enabled

    [root@controller1 ~]# pcs cluster start --all

    controller2: Starting Cluster...

    controller1: Starting Cluster...

    controller3: Starting Cluster...

    查看集群状态

    [root@controller1 corosync]# ansible controller -m command -a "pcs cluster status"

    [root@controller1 ~]# pcs cluster status

    Cluster Status:

     Stack: corosync

     Current DC: controller3 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

     Last updated: Fri Feb 17 10:39:38 2017        Last change: Fri Feb 17 10:39:29 2017 by hacluster via crmd on controller3

     3 nodes and 0 resources configured

    PCSD Status:

      controller2: Online

      controller3: Online

      controller1: Online

    [root@controller1 corosync]# ansible controller -m command -a "pcs status"

    [root@controller1 ~]# pcs status

    Cluster name: openstack-cluster

    Stack: corosync

    Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

    Last updated: Thu Mar  2 17:07:34 2017        Last change: Thu Mar  2 01:44:44 2017 by root via cibadmin on controller1

    3 nodes and 1 resource configured

    Online: [ controller1 controller2 controller3 ]

    Full list of resources:

     vip    (ocf::heartbeat:IPaddr2):    Started controller2

    Daemon Status:

      corosync: active/enabled

      pacemaker: active/enabled

      pcsd: active/enabled

    查看集群状态

    [root@controller1 corosync]# ansible controller -m command -a "crm_mon -1"

    [root@controller1 ~]# crm_mon -1

    Stack: corosync

    Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

    Last updated: Wed Mar  1 17:54:04 2017          Last change: Wed Mar  1 17:44:38 2017 by root via cibadmin on controller1

    3 nodes and 1 resource configured

    Online: [ controller1 controller2 controller3 ]

    Active resources:

    vip     (ocf::heartbeat:IPaddr2):    Started controller1

    查看pacemaker进程状态

    [root@controller1 ~]# ps aux | grep pacemaker

    root      75900  0.2  0.5 132632  9216 ?        Ss   10:39   0:00 /usr/sbin/pacemaked -f

    haclust+  75901  0.3  0.8 135268 15376 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/cib

    root      75902  0.1  0.4 135608  7920 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/stonithd

    root      75903  0.0  0.2 105092  5020 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/lrmd

    haclust+  75904  0.0  0.4 126924  7636 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/attrd

    haclust+  75905  0.0  0.2 117040  4560 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/pengine

    haclust+  75906  0.1  0.5 145328  8988 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/crmd

    root      75997  0.0  0.0 112648   948 pts/0    R+   10:40   0:00 grep --color=auto pacemaker

    查看集群状态

    [root@controller1 ~]# corosync-cfgtool -s

    Printing ring status.

    Local node ID 1

    RING ID 0

        id    = 192.168.17.132

        status    = ring 0 active with no faults

    [root@controller2 corosync]# corosync-cfgtool -s

    Printing ring status.

    Local node ID 2

    RING ID 0

        id    = 192.168.17.146

        status    = ring 0 active with no faults

    [root@controller3 ~]# corosync-cfgtool -s

    Printing ring status.

    Local node ID 3

    RING ID 0

        id    = 192.168.17.138

        status    = ring 0 active with no faults

    [root@controller1 ~]# corosync-cmapctl | grep members

    runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0

    runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.17.132)

    runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1

    runtime.totem.pg.mrp.srp.members.1.status (str) = joined

    runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0

    runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.17.146)

    runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1

    runtime.totem.pg.mrp.srp.members.2.status (str) = joined

    runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0

    runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.17.138)

    runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1

    runtime.totem.pg.mrp.srp.members.3.status (str) = joined

    查看集群状态

    [root@controller1 ~]# pcs status corosync

    Membership information

    ----------------------

        Nodeid      Votes Name

             1          1 controller1 (local)

             3          1 controller3

             2          1 controller2

    [root@controller2 corosync]# pcs status corosync

    Membership information

    ----------------------

        Nodeid      Votes Name

             1          1 controller1

             3          1 controller3

             2          1 controller2 (local)

    [root@controller3 ~]# pcs status corosync

    Membership information

    ----------------------

        Nodeid      Votes Name

             1          1 controller1

             3          1 controller3 (local)

             2          1 controller2

    [root@controller1 ~]# crm_verify -L -V

       error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined

       error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option

       error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity

    Errors found during check: config not valid

    [root@controller1 ~]#

    [root@controller1 ~]# pcs property set stonith-enabled=false

    [root@controller1 ~]# pcs property set no-quorum-policy=ignore

    [root@controller1 ~]# crm_verify -L -V

    [root@controller1 corosync]# ansible controller -m command -a "pcs property set stonith-enabled=false

    [root@controller1 corosync]# ansible controller -m command -a "pcs property set no-quorum-policy=ignore"

    [root@controller1 corosync]# ansible controller -m command -a "crm_verify -L -V"

    配置 VIP

    [root@controller1 ~]# crm

    crm(live)# configure

    crm(live)configure# show

    node 1: controller1

    node 2: controller2

    node 3: controller3

    property cib-bootstrap-options: \

        have-watchdog=false \

        dc-version=1.1.15-11.el7_3.2-e174ec8 \

        cluster-infrastructure=corosync \

        cluster-name=openstack-cluster \

        stonith-enabled=false \

        no-quorum-policy=ignore

    crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=192.168.17.111 cidr_netmask=24 nic=ens37 op start interval=0s timeout=20s op stop interval=0s timeout=20s monitor interval=30s meta priority=100

    crm(live)configure# show

    node 1: controller1

    node 2: controller2

    node 3: controller3

    primitive vip IPaddr2 \

        params ip=192.168.17.111 cidr_netmask=24 nic=ens37 \

        op start interval=0s timeout=20s \

        op stop interval=30s timeout=20s monitor \

        meta priority=100

    property cib-bootstrap-options: \

        have-watchdog=false \

        dc-version=1.1.15-11.el7_3.2-e174ec8 \

        cluster-infrastructure=corosync \

        cluster-name=openstack-cluster \

        stonith-enabled=false \

        no-quorum-policy=ignore

    crm(live)configure# commit

    crm(live)configure# exit

    查看VIP已绑定在ens37网卡上

    [root@controller1 ~]# ip a

    4: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

        link/ether 00:0c:29:ff:8b:4b brd ff:ff:ff:ff:ff:ff

        inet 192.168.17.141/24 brd 192.168.17.255 scope global dynamic ens37

           valid_lft 2388741sec preferred_lft 2388741sec

        inet 192.168.17.111/24 brd 192.168.17.255 scope global secondary ens37

           valid_lft forever preferred_lft forever

    上面指定的网卡名称3个节点必须是同一个名称,否则飘移会出现问题,切换不过去

    [root@controller1 ~]# crm status

    Stack: corosync

    Current DC: controller1 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

    Last updated: Wed Feb 22 11:42:07 2017        Last change: Wed Feb 22 11:22:56 2017 by root via cibadmin on controller1


    3 nodes and 1 resource configured


    Online: [ controller1 controller2 controller3 ]


    Full list of resources:


     vip    (ocf::heartbeat:IPaddr2):    Started controller1

    查看corosync引擎是否正常启动

    [root@controller1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log

    [51405] controller1 corosyncnotice  [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.

    Mar 01 17:35:20 [51425] controller1        cib:     info: retrieveCib:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig)

    Mar 01 17:35:20 [51425] controller1        cib:  warning: cib_file_read_and_verify:    Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2)

    Mar 01 17:35:20 [51425] controller1        cib:  warning: cib_file_read_and_verify:    Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2)

    Mar 01 17:35:20 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.Apziws (digest: /var/lib/pacemaker/cib/cib.0ZxsVW)

    Mar 01 17:35:21 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.ObYehI (digest: /var/lib/pacemaker/cib/cib.O8Rntg)

    Mar 01 17:35:42 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.eqrhsF (digest: /var/lib/pacemaker/cib/cib.6BCfNj)

    Mar 01 17:35:42 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.riot2E (digest: /var/lib/pacemaker/cib/cib.SAqtzj)

    Mar 01 17:35:42 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.Q8H9BL (digest: /var/lib/pacemaker/cib/cib.MBljlq)

    Mar 01 17:38:29 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.OTIiU4 (digest: /var/lib/pacemaker/cib/cib.JnHr1v)

    Mar 01 17:38:36 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.2cK9Yk (digest: /var/lib/pacemaker/cib/cib.JSqEH8)

    Mar 01 17:44:38 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.aPFtr3 (digest: /var/lib/pacemaker/cib/cib.E3Ve7X)

    [root@controller1 ~]#

    查看初始化成员节点通知是否正常发出 

    [root@controller1 ~]# grep  TOTEM /var/log/cluster/corosync.log 

    [51405] controller1 corosyncnotice  [TOTEM ] Initializing transport (UDP/IP Unicast).

    [51405] controller1 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none

    [51405] controller1 corosyncnotice  [TOTEM ] The network interface [192.168.17.149] is now up.

    [51405] controller1 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.17.149}

    [51405] controller1 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.17.141}

    [51405] controller1 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.17.166}

    [51405] controller1 corosyncnotice  [TOTEM ] A new membership (192.168.17.149:4) was formed. Members joined: 1

    [51405] controller1 corosyncnotice  [TOTEM ] A new membership (192.168.17.141:12) was formed. Members joined: 2 3

    检查启动过程中是否有错误产生

    [root@controller1 ~]# grep ERROR: /var/log/cluster/corosync.log


关键字