[Gluster-users] geo-replication failure.

Alvin Starr

2018-07-30 12:30:08 UTC

The brick contains about 8T of live data using up about 61M inodes so
we have LOTS of directories with small files.

It took something like 45 days for the systems to sync the first time
and I believe it completed but some time over the last month or so it
has failed.

both sides are running glusterfs-3.8.9-1.el7.x86_64

What can I do to clear this error?

[2018-07-30 11:09:59.112396] I [monitor(monitor):267:monitor] Monitor:
------------------------------------------------------------
[2018-07-30 11:09:59.112936] I [monitor(monitor):268:monitor] Monitor:
starting gsyncd worker
[2018-07-30 11:09:59.291553] I [changelogagent(agent):73:__init__]
ChangelogAgent: Agent listining...
[2018-07-30 11:09:59.308311] I [gsyncd(/bricks/cc_us/data):736:main_i]
<top>: syncing: gluster://localhost:CC-US-EDOCS ->
ssh://***@archive2.vpn.domain.net:gluster://localhost:arch-CC-US-EDOCS
[2018-07-30 11:10:10.118316] I
[master(/bricks/cc_us/data):83:gmaster_builder] <top>: setting up xsync
change detection mode
[2018-07-30 11:10:10.119020] I [master(/bricks/cc_us/data):367:__init__]
_GMaster: using 'rsync' as the sync engine
[2018-07-30 11:10:10.120713] I
[master(/bricks/cc_us/data):83:gmaster_builder] <top>: setting up
changelog change detection mode
[2018-07-30 11:10:10.121114] I [master(/bricks/cc_us/data):367:__init__]
_GMaster: using 'rsync' as the sync engine
[2018-07-30 11:10:10.123591] I
[master(/bricks/cc_us/data):83:gmaster_builder] <top>: setting up
changeloghistory change detection mode
[2018-07-30 11:10:10.123972] I [master(/bricks/cc_us/data):367:__init__]
_GMaster: using 'rsync' as the sync engine
[2018-07-30 11:10:12.165112] I
[master(/bricks/cc_us/data):1251:register] _GMaster: xsync temp
directory:
/var/lib/misc/glusterfsd/CC-US-EDOCS/ssh%3A%2F%2Froot%4065.39.151.110%3Agluster%3A%2F%2F127.0.0.1%3Aarch-CC-US-EDOCS/0a70d065ebfb511403fa881adc1073e6/xsync
[2018-07-30 11:10:12.165685] I
[resource(/bricks/cc_us/data):1533:service_loop] GLUSTER: Register time:
1532949012
[2018-07-30 11:10:12.171063] I
[master(/bricks/cc_us/data):510:crawlwrap] _GMaster: primary master with
volume id 900656fd-3f13-4ba2-bf04-90832508566e ...
[2018-07-30 11:10:12.176691] I
[master(/bricks/cc_us/data):519:crawlwrap] _GMaster: crawl interval: 1
seconds
[2018-07-30 11:10:12.183186] I [master(/bricks/cc_us/data):1165:crawl]
_GMaster: starting history crawl... turns: 1, stime: (1531938320, 0),
etime: 1532949012
[2018-07-30 11:10:12.184734] E [repce(agent):117:worker] <top>: call
failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113,
in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 54, in history
    num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 100, in cl_history_changelog
    cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2018-07-30 11:10:12.186373] E [repce(/bricks/cc_us/data):207:__call__]
RepceClient: call 12294:140093078464320:1532949012.18 (history) failed
on peer with ChangelogException
[2018-07-30 11:10:12.186680] E
[resource(/bricks/cc_us/data):1551:service_loop] GLUSTER: Changelog
History Crawl failed, [Errno 2] No such file or directory
[2018-07-30 11:10:12.187215] I
[syncdutils(/bricks/cc_us/data):220:finalize] <top>: exiting.
[2018-07-30 11:10:12.192742] I [repce(agent):92:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-30 11:10:12.193228] I [syncdutils(agent):220:finalize] <top>:
exiting.
[2018-07-30 11:10:13.142468] I [monitor(monitor):344:monitor] Monitor:
worker(/bricks/cc_us/data) died in startup phase

--
Alvin Starr || land: (905)513-7688
Netvel Inc. || Cell: (416)806-0133
***@netvel.net ||