设为首页收藏本站

追梦Linux

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 1178|回复: 1

关于RabbitMQ集群脑裂造成的消息积压

[复制链接]

482

主题

485

帖子

16万

积分

CEO

Rank: 9Rank: 9Rank: 9

积分
167821

最佳新人活跃会员热心会员推广达人宣传达人灌水之王突出贡献优秀版主荣誉管理论坛元老

QQ
发表于 2017-7-7 17:19:33 | 显示全部楼层 |阅读模式
RabbitMQ集群场景(以下网段互通)及角色:
    3个节点组成普通集群,RabbitMQ-1为主节点,同时也是磁盘节点,其他2个Slave共享RabbitMQ-1的cookie,内存节点。
    RabbitMQ-1:10.0.6.106    DISK  Master
    RabbitMQ-2:10.0.6.107    RAM      Slave
    RabbitMQ-3:10.0.7.4       RAM      Slave

故障复盘:
    1、通过监控告警发现MQ大量消息堵塞
    2、通过MQ管理控制台发现有那些异常队列存在,但是无法删除提示timeout;rabbitmqctl命令查看显示是没有该队列的
    3、查看日志发现,发现大量一下错误,意思大概队列miss【敏感部分已做处理】
   
RabbitMQ-1日志
[Bash shell] 纯文本查看 复制代码
=ERROR REPORT==== 22-Jun-2017::20:52:12 ===
Error on AMQP connection <0.31612.1422> (xxxxx:50956 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 22-Jun-2017::20:52:12 ===
Error on AMQP connection <0.11297.1438> (xxxxx:47236 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 22-Jun-2017::20:52:12 ===
Error on AMQP connection <0.7344.1421> (10.0.7.9:33440 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 22-Jun-2017::20:52:12 ===
Error on AMQP connection <0.25956.785> (10.0.7.9:60706 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 22-Jun-2017::20:52:12 ===
Error on AMQP connection <0.1573.1423> (xxxxx:50978 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 22-Jun-2017::20:52:12 ===
Error on AMQP connection <0.1793.1423> (xxxxx:50998 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 22-Jun-2017::21:06:41 ===
Channel error on connection <0.6321.0> (10.0.4.158:42431 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx'), channel 59:
operation queue.declare caused a channel exception not_found: "failed to perform operation on queue 'soChange_to_erp_queue' in vhost '/' due to timeout"

=ERROR REPORT==== 22-Jun-2017::21:06:41 ===
Channel error on connection <0.6321.0> (10.0.4.158:42431 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx'), channel 60:
operation queue.declare caused a channel exception not_found: "failed to perform operation on queue 'soChange_to_erp_queue' in vhost '/' due to timeout"

=ERROR REPORT==== 22-Jun-2017::21:06:41 ===
Channel error on connection <0.6321.0> (10.0.4.158:42431 -> 10.0.6.106:5672, vhost: '/', user: 'xxxxx'), channel 61:
operation queue.declare caused a channel exception not_found: "failed to perform operation on queue 'soChange_to_erp_queue' in vhost '/' due to timeout"

=INFO REPORT==== 22-Jun-2017::21:06:50 ===
Mirrored queue 'B99_invLocationIntegration_queue' in vhost '/': Synchronising: 1 messages to synchronise

=INFO REPORT==== 22-Jun-2017::21:06:50 ===
Mirrored queue 'B99_invLocationIntegration_queue' in vhost '/': Synchronising: batch size: 4096

=INFO REPORT==== 22-Jun-2017::21:06:50 ===
Mirrored queue 'B99_invLocationIntegration_queue' in vhost '/': Synchronising: mirrors ['rabbit@prod-rabbitmq-2'] to sync

=INFO REPORT==== 22-Jun-2017::21:06:50 ===
Mirrored queue 'B99_invLocationIntegration_queue' in vhost '/': Synchronising: complete

=INFO REPORT==== 22-Jun-2017::21:07:02 ===
Mirrored queue 'oss_message_queue' in vhost '/': Synchronising: 17 messages to synchronise

=INFO REPORT==== 22-Jun-2017::21:07:02 ===
Mirrored queue 'oss_message_queue' in vhost '/': Synchronising: batch size: 4096

=INFO REPORT==== 22-Jun-2017::21:07:02 ===
Mirrored queue 'oss_message_queue' in vhost '/': Synchronising: mirrors ['rabbit@prod-rabbitmq-2'] to sync

=INFO REPORT==== 22-Jun-2017::21:07:02 ===
Mirrored queue 'oss_message_queue' in vhost '/': Synchronising: complete

=INFO REPORT==== 22-Jun-2017::21:07:15 ===
Mirrored queue 'submit_salesReturn_queue' in vhost '/': Synchronising: 1 messages to synchronise

=INFO REPORT==== 22-Jun-2017::21:07:15 ===
Mirrored queue 'submit_salesReturn_queue' in vhost '/': Synchronising: batch size: 4096

=INFO REPORT==== 22-Jun-2017::21:07:15 ===
Mirrored queue 'submit_salesReturn_queue' in vhost '/': Synchronising: mirrors ['rabbit@prod-rabbitmq-2'] to sync

=INFO REPORT==== 22-Jun-2017::21:07:15 ===
Mirrored queue 'submit_salesReturn_queue' in vhost '/': Synchronising: complete

RabbitMQ-2
[Bash shell] 纯文本查看 复制代码
=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'submit_orderAnswer_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.537.0]rabbit@prod-rabbitmq-2.2.537.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.538.0]rabbit@prod-rabbitmq-3.1.538.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'submit_delivery_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.599.0]rabbit@prod-rabbitmq-2.2.599.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.596.0]rabbit@prod-rabbitmq-3.1.596.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'B99_taxIntegration_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.813.0]rabbit@prod-rabbitmq-2.2.813.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.811.0]rabbit@prod-rabbitmq-3.1.811.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'submit_poChange_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.477.0]rabbit@prod-rabbitmq-2.2.477.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.478.0]rabbit@prod-rabbitmq-3.1.478.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'salesReturn_to_erp_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.833.0]rabbit@prod-rabbitmq-2.2.833.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.831.0]rabbit@prod-rabbitmq-3.1.831.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'B99_deliverTakeIntegration_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.513.0]rabbit@prod-rabbitmq-2.2.513.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.514.0]rabbit@prod-rabbitmq-3.1.514.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'queue_restful' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.586.0]rabbit@prod-rabbitmq-2.2.586.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.584.0]rabbit@prod-rabbitmq-3.1.584.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'oss_message_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.809.0]rabbit@prod-rabbitmq-2.2.809.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.807.0]rabbit@prod-rabbitmq-3.1.807.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'rabbit_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.657.0]rabbit@prod-rabbitmq-2.2.657.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.657.0]rabbit@prod-rabbitmq-3.1.657.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'submit_deliverAbnormal_queue' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.779.0]rabbit@prod-rabbitmq-2.2.779.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.775.0]rabbit@prod-rabbitmq-3.1.775.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue '${rabbit.quote_inquiry_remark}' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.821.0]rabbit@prod-rabbitmq-2.2.821.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.815.0]rabbit@prod-rabbitmq-3.1.815.0[/url]>

=INFO REPORT==== 22-Jun-2017::20:07:04 ===
Mirrored queue 'oss.notify.send.request' in vhost '/': Master <[url=mailto:rabbit@prod-rabbitmq-2.2.677.0]rabbit@prod-rabbitmq-2.2.677.0[/url]> saw deaths of mirrors <[url=mailto:rabbit@prod-rabbitmq-3.1.678.0]rabbit@prod-rabbitmq-3.1.678.0[/url]>


故障分析:
    1、各种某度、某歌几小时也没合适的解决方案,有说是该版本bug----有待验证(当前版本3.6.1)
    2、程序链接过来队列需要持久化存储,故producer/consumer直接连接RabbitMQ-1(磁盘节点);
    3、由于网络分区故障,RabbitMQ-1挂了则由RabbitMQ-2(内存节点)顶上作为Master;
    4、此时由于RabbitMQ-1内的是持久化,而RabbitMQ-2是非持久化,而当RabbitMQ-1自动恢复后将作为Slave,此时producer发过来的消息便无法持久化,造成消息积压。

解决方案:
    知道问题是怎么发生便想了个法子,之前方案中出现M为disk节点,即当该节点故障且遇到持久化(D)消息时,便会出问题;
    重新组件集群RabbitMQ-1仍为M D节点,把RabbitMQ-2、RabbitMQ-3 加入进来,RabbitMQ-2作为磁盘节点,程序重新连接便正常。
    (至于安装配置这里就不罗嗦了,某歌大把的。)

总结:
    RabbitMQ 提供了三种配置:
   
    1、ignore:默认配置,发生网络分区时不作处理,当认为网络是可靠时选用该配置
    2、autoheal:各分区协商后重启客户端连接最少的分区节点,恢复集群(CAP 中保证 AP,有状态丢失)
    3、pause_minority:分区发生后判断自己所在分区内节点是否超过集群总节点数一半,如果没有超过则暂停这些节点(保证 CP,总节点数为奇数个)
默认配置是第一种,我可以修改网络分区为其他。
cat /etc/rabbitmq/rabbitmq.conf
[Bash shell] 纯文本查看 复制代码
[
{rabbit,
  [{tcp_listeners,[5672]},
   {cluster_partition_handling, autoheal}]
}
].


30

主题

39

帖子

2万

积分

CEO

Rank: 9Rank: 9Rank: 9

积分
25284

最佳新人活跃会员热心会员推广达人宣传达人灌水之王突出贡献优秀版主荣誉管理论坛元老

QQ
发表于 2017-7-8 09:39:58 | 显示全部楼层
:lo精华呀
回复

使用道具 举报

QQ|小黑屋|手机版|Archiver|追梦Linux ( 粤ICP备14096197号  点击这里给我发消息

GMT+8, 2018-10-21 14:08 , Processed in 0.315468 second(s), 36 queries .

Powered by 追梦Linux! X3.3 Licensed

© 2015-2017 追梦Linux!.

快速回复 返回顶部 返回列表