MongoDB复制集选举原理及管理详解

MongoDB复制集的节点是通过选举产生主节点的,下面将介绍复制集节点间选举的过程

  • MongoDB复制的原理

复制是基于操作日志oplog,相当于MySQL中的二进制日志,只记录发生改变的记录。复制是将主节点的oplog日志同步并应用到其他从节点过程

  • MongoDB选举的原理

节点类型分为标准(host)节点 、被动(passive)节点和仲裁(arbiter)节点。

(1)只有标准节点可能被选举为主(primary)节点,有选举权;被动节点有完整副本,只能作为复制集保存,不可能成为主节点,没有选举权;仲裁节点不存放数据,只负责投票选举,不可能成为主节点,不存放数据,依然没有选举权

(2)标准节点与被动节点的区别:priority值高者是标准节点,低者则为被动节点

(3)选举规则是票数高者获胜,priority是优先权为0~1000的值,相当于额外增加0~1000的票数。选举结果:票数高者获胜;若票数相同,数据新者获胜

  • MongoDB复制集节点间选举如图所示

MongoDB复制集选举原理及管理详解

专注于为中小企业提供成都网站建设、做网站服务,电脑端+手机端+微信端的三站合一,更高效的管理,为中小企业莲花免费做网站提供优质的服务。我们立足成都,凝聚了一批互联网行业人才,有力地推动了数千家企业的稳健成长,帮助中小企业通过网站建设实现规模扩充和转变。

下面通过实例来演示MongoDB复制集节点间的选举原理

  • 在一台CentOS7主机上使用yum在线安装Mongodb,并创建多实例,进行部署MongoDB复制集

首先配置网络YUM源,baseurl(下载路径)指定为mongodb官网提供的yum仓库

vim /etc/yum.repos.d/mongodb.repo

[mongodb-org]

name=MongoDB Repository

baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.6/x86_64/             #指定获得下载的路径

gpgcheck=1                     #表示对从这个源下载的rpm包进行校验

enabled=1                   #表示启用这个源。

gpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc

重新加载yum源,并使用yum命令下载安装mongodb

yum list

yum -y install mongodb-org

准备4个实例,设置两个标准节点, 一个被动节点和一个仲裁节点   

  • 创建数据文件和日志文件存储路径,并赋予权限

    [root@localhost ~]# mkdir -p /data/mongodb{2,3,4}
    [root@localhost ~]# mkdir /data/logs
    [root@localhost ~]# touch /data/logs/mongodb{2,3,4}.log
    [root@localhost ~]# chmod 777 /data/logs/mongodb*
    [root@localhost ~]# ll /data/logs/
    总用量 0
    -rwxrwxrwx. 1 root root 0 9月  15 22:31 mongodb2.log
    -rwxrwxrwx. 1 root root 0 9月  15 22:31 mongodb3.log
    -rwxrwxrwx. 1 root root 0 9月  15 22:31 mongodb4.log

编辑4个MongoDB实例的配置文件

  • 先编辑yum安装的默认实例的配置文件/etc/mongod.conf,指定监听IP,端口默认为27017,开启replication参数配置,replSetName:true(自定义)

[root@localhost ~]# vim /etc/mongod.conf

# mongod.conf

# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/

# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# Where and how to store data.
storage:
  dbPath: /var/lib/mongo
  journal:
    enabled: true
#  engine:
#  mmapv1:
#  wiredTiger:

# how the process runs
processManagement:
  fork: true  # fork and run in background
  pidFilePath: /var/run/mongodb/mongod.pid  # location of pidfile
  timeZoneInfo: /usr/share/zoneinfo

# network interfaces
net: 

  port: 27017                    #默认端口          
  bindIp: 0.0.0.0             #监听任意地址

#security:

#operationProfiling:

replication:                   #去掉前面的“#”注释,开启该参数设置
replSetName: true          #设置复制集名称

  • 复制配置文件给其他实例,并将mongodb2.conf 中的port参数配置为27018,mongod3.conf中的port参数配置为27019,mongod4.conf中的port参数配置为27020。 同样也将dbpath和logpath参数修改为对应的路径值

cp  /etc/mongod.conf /etc/mongod2.conf

cp /etc/mongod2.conf /etc/mongod3.conf

cp /etc/mongod2.conf /etc/mongod4.conf

  • 实例2的配置文件mongodb2.conf 修改

vim /etc/mongod2.conf

systemLog:

  destination: file

  logAppend: true

path: /data/logs/mongodb2.log   

storage:

dbPath: /data/mongodb/mongodb2

  journal:

enabled: true

port: 27018 

bindIp: 0.0.0.0  # Listen to local interface only, comment to listen on all interfaces.

#security:

#operationProfiling:

replication:
replSetName: true

  • 实例3的配置文件mongodb3.conf 修改

vim /etc/mongod3.conf

systemLog:

  destination: file

  logAppend: true

path: /data/logs/mongodb3.log   

storage:

dbPath: /data/mongodb/mongodb3 

  journal:

enabled: true

port: 27019 

bindIp: 0.0.0.0  # Listen to local interface only, comment to listen on all interfaces.

#security:

#operationProfiling:

replication:
replSetName: true

  • 实例4的配置文件mongodb4.conf 修改

vim /etc/mongod4.conf

systemLog:

  destination: file

  logAppend: true

path: /data/logs/mongodb4.log

storage:

dbPath: /data/mongodb/mongodb4

  journal:

enabled: true

port: 27020

bindIp: 0.0.0.0  # Listen to local interface only, comment to listen on all interfaces.

#security:

#operationProfiling:

replication:
replSetName: true

启动mongodb各实例

[root@localhost ~]# mongod -f /etc/mongod.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93576
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93608
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod3.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93636
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod4.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93664
child process started successfully, parent exiting
[root@localhost ~]# netstat -antp | grep mongod                        //查看mongodb进程状态
tcp        0      0 0.0.0.0:27019           0.0.0.0:*               LISTEN      93636/mongod      
tcp        0      0 0.0.0.0:27020           0.0.0.0:*               LISTEN      93664/mongod      
tcp        0      0 0.0.0.0:27017           0.0.0.0:*               LISTEN      93576/mongod      
tcp        0      0 0.0.0.0:27018           0.0.0.0:*               LISTEN      93608/mongod

配置复制集的优先级

  • 登录默认实例 mongo,配置4个节点 MongoDB 复制集,设置两个标准节点,一个被动节点和一个仲裁节点,

  • 根据优先级确定节点: 优先级为 100的为标准节点,端口号为 27017和27018  ,优先级为0 的为被动节点,端口号为27019;仲裁节点为27020

[root@localhost ~]# mongo
MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.7

> cfg={"_id":"true","members":[{"_id":0,"host":"192.168.195.137:27017","priority":100},                       

{"_id":1,"host":"192.168.195.137:27018","priority":100},{"_id":2,"host":"192.168.195.137:27019","priority":0},{"_id":3,"host":"192.168.195.137:27020","arbiterOnly":true}]}                       
{
    "_id" : "true",
    "members" : [
        {
            "_id" : 0,
            "host" : "192.168.195.137:27017",                #标准节点1,优先级为100
            "priority" : 100
        },
        {
            "_id" : 1,
            "host" : "192.168.195.137:27018",               #标准节点2,优先级为100
            "priority" : 100
        },
        {
            "_id" : 2,
            "host" : "192.168.195.137:27019",              #被动节点,优先级为0
            "priority" : 0
        },
        {
            "_id" : 3,
            "host" : "192.168.195.137:27020",                #仲裁节点
            "arbiterOnly" : true

> rs.initiate(cfg)                               #初始化配置
{
    "ok" : 1,
    "operationTime" : Timestamp(1537077618, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1537077618, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }

  • 使用命令 rs.isMaster()  查看各节点身份

true:PRIMARY> rs.isMaster()
{
    "hosts" : [
        "192.168.195.137:27017",               #标准节点
        "192.168.195.137:27018"
    ],
    "passives" : [
        "192.168.195.137:27019"                #被动节点
    ],
    "arbiters" : [
        "192.168.195.137:27020"              #仲裁节点
    ],
    "setName" : "true",
    "setVersion" : 1,
    "ismaster" : true,
    "secondary" : false,
    "primary" : "192.168.195.137:27017",
    "me" : "192.168.195.137:27017",

  • 在主节点上进行增,删,改。查操作

true:PRIMARY> use kfc
switched to db kfc
true:PRIMARY> db.info.insert({"id":1,"name":"tom"})
WriteResult({ "nInserted" : 1 })
true:PRIMARY> db.info.insert({"id":2,"name":"jack"})
WriteResult({ "nInserted" : 1 })
true:PRIMARY> db.info.find()
{ "_id" : ObjectId("5b9df3ff690f4b20fa330b18"), "id" : 1, "name" : "tom" }
{ "_id" : ObjectId("5b9df40f690f4b20fa330b19"), "id" : 2, "name" : "jack

true:PRIMARY> db.info.update({"id":2},{$set:{"name":"lucy"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
true:PRIMARY> db.info.remove({"id":1})
WriteResult({ "nRemoved" : 1 })

 

  • 查看主节点的oplog日志记录所有操作 ,在默认数据库 local 中的oplog.rs   查看


true:PRIMARY> use local
switched to db local
true:PRIMARY> show tables
me
oplog.rs
replset.election
replset.minvalid
startup_log
system.replset
system.rollback.id
true:PRIMARY> db.oplog.rs.find()                   #查看日志记录所有操作    

............                                                       # 通过日志记录,可以找到刚才的操作信息

{ "ts" : Timestamp(1537078271, 2), "t" : NumberLong(1), "h" : NumberLong("-5529983416084904509"), "v" : 2, "op" : "c", "ns" : "kfc.$cmd", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "wall" : ISODate("2018-09-16T06:11:11.072Z"), "o" : { "create" : "info", "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "kfc.info" } } }
{ "ts" : Timestamp(1537078271, 3), "t" : NumberLong(1), "h" : NumberLong("-1436300260967761649"), "v" : 2, "op" : "i", "ns" : "kfc.info", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "wall" : ISODate("2018-09-16T06:11:11.072Z"), "o" : { "_id" : ObjectId("5b9df3ff690f4b20fa330b18"), "id" : 1, "name" : "tom" } }
{ "ts" : Timestamp(1537078287, 1), "t" : NumberLong(1), "h" : NumberLong("9052955074674132871"), "v" : 2, "op" : "i", "ns" : "kfc.info", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "wall" : ISODate("2018-09-16T06:11:27.562Z"), "o" : { "_id" : ObjectId("5b9df40f690f4b20fa330b19"), "id" : 2, "name" : "jack" } }

...............

{ "ts" : Timestamp(1537078543, 1), "t" : NumberLong(1), "h" : NumberLong("-5120962218610090442"), "v" : 2, "op" : "u", "ns" : "kfc.info", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "o2" : { "_id" : ObjectId("5b9df40f690f4b20fa330b19") }, "wall" : ISODate("2018-09-16T06:15:43.494Z"), "o" : { "$v" : 1, "$set" : { "name" : "lucy" } } }

模拟标准节点1故障

  • 如果主节点出现故障,另一个标准节点会选举成为新的主节点。

[root@localhost ~]# mongod -f /etc/mongod.conf --shutdown            #关闭主节点服务
killing process with pid: 52986
[root@localhost ~]# mongo --port 27018                               #登录另一个标准节点端口 27018

MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27018/
MongoDB server version: 3.6.7

true:PRIMARY> rs.status()                              #查看状态,可以看到这台标准节点已经选举为主节点

"members" : [
        {
            "_id" : 0,
            "name" : "192.168.195.137:27017",
            "health" : 0,                                              #健康值为 0 ,说明端口27017 已经宕机了
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "optimeDurable" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },

{
            "_id" : 1,
            "name" : "192.168.195.137:27018",
            "health" : 1,              
            "state" : 1,
            "stateStr" : "PRIMARY",                           #此时另一台标准节点被选举为主节点,端口为 27018
            "uptime" : 3192,
            "optime" : {
                "ts" : Timestamp(1537080552, 1),
                "t" : NumberLong(2)
            },

模拟标准节点2故障

  • 将标准节点服务全部关闭,查看被动节点是否会被选举为主节点

[root@localhost ~]# mongod -f /etc/mongod2.conf --shutdown         #关闭第二个标准节点服务
killing process with pid: 53018
[root@localhost ~]# mongo --port 27019                           #进入第三个被动节点实例
MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27019/
MongoDB server version: 3.6.7

true:SECONDARY> rs.status()                            #查看复制集状态信息                      

..............

"members" : [
        {
            "_id" : 0,
            "name" : "192.168.195.137:27017",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "optimeDurable" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },

.................

{
            "_id" : 1,
            "name" : "192.168.195.137:27018",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "optimeDurable" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },

..................

{
            "_id" : 2,
            "name" : "192.168.195.137:27019",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",                          #被动节点并没有被选举为主节点,说明被动节点不可能成为活跃节点
            "uptime" : 3972,
            "optime" : {
                "ts" : Timestamp(1537081303, 1),
                "t" : NumberLong(2)
            },

..................

{
        "_id" : 3,
        "name" : "192.168.195.137:27020",
        "health" : 1,
        "state" : 7,
        "stateStr" : "ARBITER",
        "uptime" : 3722,

另外我们可以通过启动标准节点的先后顺序,实现人为指定主节点,默认谁先启动,谁就是主节点。

允许从节点读取数据

  • 默认MongoDB复制集的从节点不能读取数据,可以使用rs.slaveOk()命令允许能够在从节点读取数据

  • 重新启动两个标准节点

[root@localhost ~]# mongod -f /etc/mongod.conf
about to fork child process, waiting until server is ready for connections.
forked process: 54685
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 54773
child process started successfully, parent exiting

  • 进入复制集的其中一个从节点,配置其允许读取数据


[root@localhost ~]# mongo --port 27018
MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27018/
MongoDB server version: 3.6.7

true:SECONDARY> rs.slaveOk()                           #允许默认从节点读取数据
true:SECONDARY> show dbs              #读取成功
admin   0.000GB
config  0.000GB
kfc     0.000GB
local   0.000GB

查看复制状态信息

  • 可以使用rs.printReplicationInfo()和rs.printSlaveReplicationInfo()命令来查看复制集状态

true:SECONDARY> rs.printReplicationInfo()         #查看日志文件能够使用的大小 默认oplog大小会占用64位实例5%的可用磁盘空间
configured oplog size:  990MB
log length start to end: 5033secs (1.4hrs)
oplog first event time:  Sun Sep 16 2018 14:00:18 GMT+0800 (CST)
oplog last event time:   Sun Sep 16 2018 15:24:11 GMT+0800 (CST)
now:                     Sun Sep 16 2018 15:24:13 GMT+0800 (CST)
true:SECONDARY> rs.printSlaveReplicationInfo()                #查看节点        
source: 192.168.195.137:27018
    syncedTo: Sun Sep 16 2018 15:24:21 GMT+0800 (CST)
    0 secs (0 hrs) behind the primary
source: 192.168.195.137:27019
    syncedTo: Sun Sep 16 2018 15:24:21 GMT+0800 (CST)
    0 secs (0 hrs) behind the primary

会发现仲裁节点并不具备数据复制

更改oplog大小

  • oplog即operations log简写,存储在local数据库中。oplog中新操作会自动替换旧的操作,以保证oplog不会超过预设的大小。默认情况下,oplog大小会占用64位的实例5%的可用磁盘

  • 在MongoDB复制的过程中,主节点应用业务操作修改到数据库中,然后记录这些操作到oplog中,从节点复制这些oplog,然后应用这些修改。这些操作是异步的。如果从节点的操作已经被主节点落下很远,oplog日志在从节点还没执行完,oplog可能已经轮滚一圈了,从节点跟不上同步,复制就会停下,从节点需要重新做完整的同步,为了避免此种情况,尽量保证主节点的oplog足够大,能够存放相当长时间的操作记录

  • (1)关闭mongodb

true:PRIMARY> use admin
switched to db admin
true:PRIMARY> db.shutdownServer()

  • (2)修改配置文件,注销掉replication相关设置,并修改端口号,目的使其暂时脱离复制集成为一个独立的单体,

vim /etc/mongod.conf

port: 27027

#replication:

# replSetName: true

  • (3)单实例模式启动,并将之前的oplog备份一下

mongod -f /etc/mongod.conf

mongodump --port=27028 -d local -c oplog.rs -o /opt/

  • (4)进入实例中,删除掉原来的oplog.rs,使用db.runCommand命令重新创建oplog.rs,并更改oplog大小

[root@localhost logs]# mongo --port 27027

> use local
> db.oplog.rs.drop()
> db.runCommand( { create: "oplog.rs", capped: true, size: (2 * 1024 * 1024 * 1024) } )

  • (5)关闭mongodb服务,重新将配置文件项改回原来设置,并添加设置oplogSizeMB: 2048


    > use admin
    > db.shutdownServer()
    本文名称:MongoDB复制集选举原理及管理详解
    浏览路径:http://pcwzsj.com/article/pepeih.html