[Gluster-users] Gluster 3.12.11 geo-replication connection to peer is broken

Discussion:

Pablo J Rebollo Sosa

2018-07-23 20:17:44 UTC

Hi,

Iâm having problem with Gluster 3.12.11 geo-replication in CentOS 7.5. The process starts the geo-replication but after few minutes the log shows âconnection to peer is brokenâ.

The âstatus detailâ looks ok but no files are replicated.

[***@gluster1 vol_replicated]# gluster volume geo-replication vol_replicated ***@10.20.220.12::georep_1 status detail | sort

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
gluster1 vol_replicated /export/brick1/vol_replicated geoaccount1 ***@10.20.220.12::georep_1 10.20.220.12 Active Hybrid Crawl N/A 8191 6550 0 0 N/A N/A N/A
gluster2 vol_replicated /export/brick1/vol_replicated geoaccount1 ***@10.20.220.12::georep_1 10.20.220.13 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
gluster3 vol_replicated /export/brick1/vol_replicated geoaccount1 ***@10.20.220.12::georep_1 10.20.220.12 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
gluster4 vol_replicated /export/brick1/vol_replicated geoaccount1 ***@10.20.220.12::georep_1 10.20.220.13 Active Hybrid Crawl N/A 8191 6532 0 0 N/A N/A N/A

These are the messages on the log file.

[2018-07-23 19:35:50.18026] I [gsyncdstatus(/export/brick1/vol_replicated):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-07-23 19:35:50.19126] I [gsyncdstatus(/export/brick1/vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-07-23 19:35:50.19480] I [master(/export/brick1/vol_replicated):1432:crawl] _GMaster: starting history crawl turns=1 stime=(0, 0) entry_stime=None etime=1532374550
[2018-07-23 19:35:50.20056] E [repce(/export/brick1/vol_replicated):117:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 103, in cl_history_changelog
raise ChangelogHistoryNotAvailable()
ChangelogHistoryNotAvailable
[2018-07-23 19:35:50.20999] E [repce(/export/brick1/vol_replicated):209:__call__] RepceClient: call failed on peer call=39755:140602890745664:1532374550.02 method=history error=ChangelogHistoryNotAvailable
[2018-07-23 19:35:50.21156] I [resource(/export/brick1/vol_replicated):1675:service_loop] GLUSTER: Changelog history not available, using xsync
[2018-07-23 19:35:50.28688] I [master(/export/brick1/vol_replicated):1543:crawl] _GMaster: starting hybrid crawl stime=(0, 0)
[2018-07-23 19:35:50.30505] I [gsyncdstatus(/export/brick1/vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=Hybrid Crawl
[2018-07-23 19:35:54.35396] I [master(/export/brick1/vol_replicated):1554:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/glusterfsd/vol_replicated/ssh%3A%2F%2Fgeoaccount1%4010.20.220.12%3Agluster%3A%2F%2F127.0.0.1%3Ageorep_1/a68ebfef8cdf86c3c6e9a0d85969cd3f/xsync/XSYNC-CHANGELOG.1532374550
[2018-07-23 19:36:11.590595] E [syncdutils(/export/brick1/vol_replicated):304:log_raise_exception] <top>: connection to peer is broken

Anyone have some clues to what might be wrong?

Best regards,

Pablo J. Rebollo-Sosa

Kotresh Hiremath Ravishankar

2018-07-24 04:44:48 UTC

Permalink

Hi Pablo,

The geo-rep status should go to Faulty if he connection to peer is broken.
Does node log files failing with same error? Are these logs repeating?
Does stop and start geo-rep giving the same error?

Thanks,
Kotresh HR

Post by Pablo J Rebollo Sosa
Hi,
Iâm having problem with Gluster 3.12.11 geo-replication in CentOS 7.5.
The process starts the geo-replication but after few minutes the log shows
âconnection to peer is brokenâ.
The âstatus detailâ looks ok but no files are replicated.
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-------------------------------------
MASTER NODE MASTER VOL MASTER BRICK SLAVE
USER SLAVE SLAVE NODE STATUS
CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES
CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
gluster1 vol_replicated /export/brick1/vol_replicated
Active Hybrid Crawl N/A 8191 6550 0 0
N/A N/A N/A
gluster2 vol_replicated /export/brick1/vol_replicated
Passive N/A N/A N/A N/A N/A N/A
N/A N/A N/A
gluster3 vol_replicated /export/brick1/vol_replicated
Passive N/A N/A N/A N/A N/A N/A
N/A N/A N/A
gluster4 vol_replicated /export/brick1/vol_replicated
Active Hybrid Crawl N/A 8191 6532 0 0
N/A N/A N/A
These are the messages on the log file.
[2018-07-23 19:35:50.18026] I [gsyncdstatus(/export/brick1/
vol_replicated):276:set_active] GeorepStatus: Worker Status Change
status=Active
[2018-07-23 19:35:50.19126] I [gsyncdstatus(/export/brick1/
vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status
Change status=History Crawl
[2018-07-23 19:35:50.19480] I [master(/export/brick1/vol_replicated):1432:crawl]
_GMaster: starting history crawl turns=1 stime=(0, 0)
entry_stime=None etime=1532374550
[2018-07-23 19:35:50.20056] E [repce(/export/brick1/vol_replicated):117:worker]
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
103, in cl_history_changelog
raise ChangelogHistoryNotAvailable()
ChangelogHistoryNotAvailable
[2018-07-23 19:35:50.20999] E [repce(/export/brick1/vol_replicated):209:__call__]
RepceClient: call failed on peer call=39755:140602890745664:1532374550.02
method=history error=ChangelogHistoryNotAvailable
[2018-07-23 19:35:50.21156] I [resource(/export/brick1/vol_replicated):1675:service_loop]
GLUSTER: Changelog history not available, using xsync
[2018-07-23 19:35:50.28688] I [master(/export/brick1/vol_replicated):1543:crawl]
_GMaster: starting hybrid crawl stime=(0, 0)
[2018-07-23 19:35:50.30505] I [gsyncdstatus(/export/brick1/
vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status
Change status=Hybrid Crawl
[2018-07-23 19:35:54.35396] I [master(/export/brick1/vol_replicated):1554:crawl]
_GMaster: processing xsync changelog path=/var/lib/misc/glusterfsd/
vol_replicated/ssh%3A%2F%2Fgeoaccount1%4010.20.220.12%
3Agluster%3A%2F%2F127.0.0.1%3Ageorep_1/a68ebfef8cdf86c3c6e9a0d85969cd
3f/xsync/XSYNC-CHANGELOG.1532374550
[2018-07-23 19:36:11.590595] E [syncdutils(/export/brick1/
vol_replicated):304:log_raise_exception] <top>: connection to peer is
broken
Anyone have some clues to what might be wrong?
Best regards,
Pablo J. Rebollo-Sosa
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

Pablo J Rebollo Sosa

2018-07-24 05:50:45 UTC

Permalink

Dear Kotresh,

Post by Kotresh Hiremath Ravishankar
Hi Pablo,
The geo-rep status should go to Faulty if he connection to peer is broken.

The geo-rep status donât go to âfaultyâ after the âconnection to peer is brokenâ on the event log.

Post by Kotresh Hiremath Ravishankar
Does node log files failing with same error? Are these logs repeating?

The âconnection to peer is brokenâ error is on the following log file. No new events are added after âconnection to peer is brokenâ on the master.

/var/log/glusterfs/geo-replication/vol_replicated/ssh%3A%2F%2Fgeoaccount1%4010.20.220.12%3Agluster%3A%2F%2F127.0.0.1%3Ageorep_1.log

Post by Kotresh Hiremath Ravishankar
Does stop and start geo-rep giving the same error?

I restarted the geo-rep process and keeps giving the same error.

Another user reported the same problem last month.

https://bugzilla.redhat.com/show_bug.cgi?id=1595916

Post by Kotresh Hiremath Ravishankar
Thanks,
Kotresh HR
Hi,
Iâm having problem with Gluster 3.12.11 geo-replication in CentOS 7.5. The process starts the geo-replication but after few minutes the log shows âconnection to peer is brokenâ.
The âstatus detailâ looks ok but no files are replicated.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
These are the messages on the log file.
[2018-07-23 19:35:50.18026] I [gsyncdstatus(/export/brick1/vol_replicated):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-07-23 19:35:50.19126] I [gsyncdstatus(/export/brick1/vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-07-23 19:35:50.19480] I [master(/export/brick1/vol_replicated):1432:crawl] _GMaster: starting history crawl turns=1 stime=(0, 0) entry_stime=None etime=1532374550
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 103, in cl_history_changelog
raise ChangelogHistoryNotAvailable()
ChangelogHistoryNotAvailable
[2018-07-23 19:35:50.20999] E [repce(/export/brick1/vol_replicated):209:__call__] RepceClient: call failed on peer call=39755:140602890745664:1532374550.02 method=history error=ChangelogHistoryNotAvailable
[2018-07-23 19:35:50.21156] I [resource(/export/brick1/vol_replicated):1675:service_loop] GLUSTER: Changelog history not available, using xsync
[2018-07-23 19:35:50.28688] I [master(/export/brick1/vol_replicated):1543:crawl] _GMaster: starting hybrid crawl stime=(0, 0)
[2018-07-23 19:35:50.30505] I [gsyncdstatus(/export/brick1/vol_replicated):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=Hybrid Crawl
[2018-07-23 19:35:54.35396] I [master(/export/brick1/vol_replicated):1554:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/glusterfsd/vol_replicated/ssh%3A%2F%2Fgeoaccount1%4010.20.220.12%3Agluster%3A%2F%2F127.0.0.1%3Ageorep_1/a68ebfef8cdf86c3c6e9a0d85969cd3f/xsync/XSYNC-CHANGELOG.1532374550
[2018-07-23 19:36:11.590595] E [syncdutils(/export/brick1/vol_replicated):304:log_raise_exception] <top>: connection to peer is broken
Anyone have some clues to what might be wrong?
Best regards,
Pablo J. Rebollo-Sosa
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
--
Thanks and Regards,
Kotresh H R