Discussion:
[Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue
Kotte, Christian (Ext)
2018-09-11 10:49:10 UTC
Permalink
Hi all,

I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup. The gsyncd.log on the master is fine, but I have some strange changelog warnings and errors on the interimmaster:

gsyncd.log


[2018-09-11 10:38:35.575464] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time stime=(1536662250, 0)
[2018-09-11 10:38:37.126749] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4698 num_files=1 job=1 return_code=23
[2018-09-11 10:38:37.128668] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:39.353209] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4057 num_files=1 job=2 return_code=23
[2018-09-11 10:38:39.354737] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:41.501187] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4781 num_files=1 job=3 return_code=23
[2018-09-11 10:38:41.503048] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:43.575047] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4385 num_files=1 job=1 return_code=23
[2018-09-11 10:38:43.576597] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:45.838089] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4765 num_files=1 job=2 return_code=23
[2018-09-11 10:38:45.840205] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:47.969033] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4602 num_files=1 job=3 return_code=23
[2018-09-11 10:38:47.970118] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:50.54420] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4717 num_files=1 job=1 return_code=23
[2018-09-11 10:38:50.56072] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:52.317955] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4711 num_files=1 job=2 return_code=23
[2018-09-11 10:38:52.319642] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:54.448926] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4715 num_files=1 job=3 return_code=23
[2018-09-11 10:38:54.451127] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.538007] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4759 num_files=1 job=1 return_code=23
[2018-09-11 10:38:56.538914] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.544816] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-11 10:38:56.545031] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536662336.5450 DATA=0 XATT=0
[2018-09-11 10:38:56.545356] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1536662311 entry_stime=None changelog_start=1536662311 stime=(1536662310, 0) duration=20.9674 num_changelogs=1 mode=live_changelog

I had those issues in the past with 4.1.2 as well. I could fix it only by deleting the geo-replication and the gluster volume and re-create everything.

If I delete the geo-replication and delete the changelogs directory or the CHANGELOG files, I get this error:

gsyncd.log


[2018-09-11 10:26:44.928277] E [repce(agent /bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 61] No data available



Or

Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory

I read somewhere that if I delete the geo-replication with “reset-sync-time”, the changelogs are cleared, but this doesn’t happen.

How can I reset the changelog without deleting all data?

Regards,

Christian Kotte
Kotresh Hiremath Ravishankar
2018-09-12 05:31:07 UTC
Permalink
Answer inline.

On Tue, Sep 11, 2018 at 4:19 PM, Kotte, Christian (Ext) <
Post by Kotte, Christian (Ext)
Hi all,
I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup.
The gsyncd.log on the master is fine, but I have some strange changelog
gsyncd.log


[2018-09-11 10:38:35.575464] I [master(worker
/bricks/brick1/brick):1460:crawl] _GMaster: slave's time
stime=(1536662250, 0)
[2018-09-11 10:38:37.126749] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4698 num_files=1 job=1 return_code=23
[2018-09-11 10:38:37.128668] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:39.353209] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4057 num_files=1 job=2 return_code=23
[2018-09-11 10:38:39.354737] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:41.501187] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4781 num_files=1 job=3 return_code=23
[2018-09-11 10:38:41.503048] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:43.575047] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4385 num_files=1 job=1 return_code=23
[2018-09-11 10:38:43.576597] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:45.838089] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4765 num_files=1 job=2 return_code=23
[2018-09-11 10:38:45.840205] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:47.969033] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4602 num_files=1 job=3 return_code=23
[2018-09-11 10:38:47.970118] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:50.54420] I [master(worker /bricks/brick1/brick):1944:syncjob]
Syncer: Sync Time Taken duration=1.4717 num_files=1 job=1
return_code=23
[2018-09-11 10:38:50.56072] W [master(worker /bricks/brick1/brick):1346:process]
_GMaster: incomplete sync, retrying changelogs
files=['CHANGELOG.1536662311']
[2018-09-11 10:38:52.317955] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4711 num_files=1 job=2 return_code=23
[2018-09-11 10:38:52.319642] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:54.448926] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4715 num_files=1 job=3 return_code=23
[2018-09-11 10:38:54.451127] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.538007] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
duration=1.4759 num_files=1 job=1 return_code=23
[2018-09-11 10:38:56.538914] E [master(worker
/bricks/brick1/brick):1325:process] _GMaster: changelogs could not be
processed completely - moving on... files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.544816] I [master(worker
/bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0
MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-11 10:38:56.545031] I [master(worker
/bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536662336.5450
DATA=0 XATT=0
[2018-09-11 10:38:56.545356] I [master(worker
/bricks/brick1/brick):1394:process] _GMaster: Batch Completed
changelog_end=1536662311 entry_stime=None
changelog_start=1536662311 stime=(1536662310, 0)
duration=20.9674 num_changelogs=1 mode=live_changelog
There seems to be a bug, please raise a bug. For now as a work around add
the following line at the end on all the master node's configuration with
any editor. After adding it on all master nodes, stop and start geo-rep.

rsync-options = --ignore-missing-args

configuration file:
/var/lib/glusterd/geo-replication/<mastervol>_<slave_node>_<slave/vol>gsyncd.conf
Post by Kotte, Christian (Ext)
I had those issues in the past with 4.1.2 as well. I could fix it only by
deleting the geo-replication and the gluster volume and re-create
everything.
If I delete the geo-replication and delete the changelogs directory or the
gsyncd.log


[2018-09-11 10:26:44.928277] E [repce(agent /bricks/brick1/brick):105:worker]
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
101, in cl_history_changelog
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
28, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 61] No data available


Or
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
101, in cl_history_changelog
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
28, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
Please share the changelog log file to debug this. On the same node where
you got this traceback, on same location, share the following log file
"changes-bricks-brick1-brick.log"
Post by Kotte, Christian (Ext)
I read somewhere that if I delete the geo-replication with
“reset-sync-time”, the changelogs are cleared, but this doesn’t happen.
changelogs are not cleared, but in the new geo-rep session, the old
changelogs are not used for syncing.
Post by Kotte, Christian (Ext)
How can I reset the changelog without deleting all data?
I didn't understand clearly what is the requirement here. Could you
elaborate?
Post by Kotte, Christian (Ext)
Regards,
Christian Kotte
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Thanks and Regards,
Kotresh H R
Kotte, Christian (Ext)
2018-09-12 11:04:27 UTC
Permalink
There seems to be a bug, please raise a bug. For now as a work around add the following line at the end on all the master node's configuration with
any editor. After adding it on all master nodes, stop and start geo-rep.
rsync-options = --ignore-missing-args
configuration file: /var/lib/glusterd/geo-replication/<mastervol>_<slave_node>_<slave/vol>gsyncd.conf
I did that on the master and interim master. Nothing has changed.

Just to make sure. Here is what I did:
/var/lib/glusterd/geo-replication//<mastervol>_<slave_node>_<slave_vol>/gsyncd.conf
[vars]
stime-xattr-prefix = trusted.glusterfs.d0b96093-d48d-4b92-bd91-7c88a3c33dcb.10dc1a40-70b2-4f3c-b874-2653fa778134
rsync-options = --ignore-missing-args
Post by Kotte, Christian (Ext)
I read somewhere that if I delete the geo-replication with “reset-sync-time”, the changelogs are cleared, but this doesn’t happen.
changelogs are not cleared, but in the new geo-rep session, the old changelogs are not used for syncing.
When I delete the geo-replication with “delete reset-sync-time” and then create it again, I have the same warnings and errors. However, the changelog names are different. It looks like every time a new changelog file is created and I already have hundreds of CHANGELOG.* files. Is this a normal behaviour?

# gluster volume geo-replication <vol> geoaccount@<slave_node>::<vol> delete reset-sync-time
Deleting geo-replication session between <vol> & geoaccount@<slave_node>::<vol> has been successful
# gluster volume geo-replication <vol> geoaccount@<slave_node>::<vol> create
Total available size of master is greater than available size of slave
<slave_node>::<vol> is not empty. Please delete existing files in <slave_node>::<vol> and retry, or use force to continue without deleting the existing files.
geo-replication command failed
# gluster volume geo-replication <vol> geoaccount@<slave_node>::<vol> create force
Creating geo-replication session between <vol> & geoaccount@<slave_node>::<vol> has been successful
# gluster volume geo-replication <vol> geoaccount@<slave_node>::<vol> start
Starting geo-replication session between <vol> & geoaccount@<slave_node>::<vol> has been successful

[2018-09-12 10:48:42.957526] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time stime=(1536749259, 0)
[2018-09-12 10:48:45.2098] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4931 num_files=1 job=2 return_code=23
[2018-09-12 10:48:45.2836] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:48:47.48713] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5327 num_files=1 job=1 return_code=23
[2018-09-12 10:48:47.50287] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:48:49.110692] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4517 num_files=1 job=3 return_code=23
[2018-09-12 10:48:49.112823] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:48:51.505287] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4960 num_files=1 job=2 return_code=23
[2018-09-12 10:48:51.507086] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:48:53.557701] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5014 num_files=1 job=1 return_code=23
[2018-09-12 10:48:53.559295] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:48:55.642981] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5239 num_files=1 job=3 return_code=23
[2018-09-12 10:48:55.644567] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:48:58.44553] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5315 num_files=1 job=2 return_code=23
[2018-09-12 10:48:58.46200] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:49:00.74062] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5080 num_files=1 job=1 return_code=23
[2018-09-12 10:49:00.75681] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:49:02.155219] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5041 num_files=1 job=3 return_code=23
[2018-09-12 10:49:02.156243] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749320']
[2018-09-12 10:49:04.564528] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5122 num_files=1 job=2 return_code=23
[2018-09-12 10:49:04.566395] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536749320']
[2018-09-12 10:49:04.775491] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-12 10:49:04.775701] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536749344.7757 DATA=0 XATT=0
[2018-09-12 10:49:04.776072] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1536749320 entry_stime=(1536747669, 0) changelog_start=1536749320 stime=(1536749319, 0) duration=21.5680 num_changelogs=1 mode=live_changelog
[2018-09-12 10:49:44.858145] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time stime=(1536749319, 0)
[2018-09-12 10:49:46.717991] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5076 num_files=1 job=3 return_code=23
[2018-09-12 10:49:46.719527] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:49:49.182353] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5217 num_files=1 job=1 return_code=23
[2018-09-12 10:49:49.183965] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:49:51.199156] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4758 num_files=1 job=3 return_code=23
[2018-09-12 10:49:51.200678] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:49:53.629234] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4621 num_files=1 job=2 return_code=23
[2018-09-12 10:49:53.630747] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:49:55.640120] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4507 num_files=1 job=1 return_code=23
[2018-09-12 10:49:55.641657] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:49:57.669339] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4623 num_files=1 job=3 return_code=23
[2018-09-12 10:49:57.670969] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:50:00.91667] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4546 num_files=1 job=2 return_code=23
[2018-09-12 10:50:00.93130] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:50:02.129616] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4819 num_files=1 job=1 return_code=23
[2018-09-12 10:50:02.130310] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:50:04.134572] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4574 num_files=1 job=3 return_code=23
[2018-09-12 10:50:04.135277] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749380']
[2018-09-12 10:50:06.488636] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.3901 num_files=1 job=2 return_code=23
[2018-09-12 10:50:06.489341] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536749380']
[2018-09-12 10:50:06.599170] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-12 10:50:06.599361] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536749406.5993 DATA=0 XATT=0
[2018-09-12 10:50:06.599673] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1536749380 entry_stime=(1536747669, 0) changelog_start=1536749380 stime=(1536749379, 0) duration=21.4289 num_changelogs=1 mode=live_changelog
[2018-09-12 10:50:41.683792] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time stime=(1536749379, 0)
[2018-09-12 10:50:43.507179] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4750 num_files=1 job=2 return_code=23
[2018-09-12 10:50:43.508754] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:50:45.637758] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4549 num_files=1 job=3 return_code=23
[2018-09-12 10:50:45.639220] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:50:47.641165] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4551 num_files=1 job=1 return_code=23
[2018-09-12 10:50:47.642779] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:50:49.972248] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4573 num_files=1 job=2 return_code=23
[2018-09-12 10:50:49.973787] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:50:52.134746] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4887 num_files=1 job=3 return_code=23
[2018-09-12 10:50:52.136266] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:50:54.152932] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5009 num_files=1 job=1 return_code=23
[2018-09-12 10:50:54.154515] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:50:56.454425] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4743 num_files=1 job=2 return_code=23
[2018-09-12 10:50:56.456015] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:50:58.606976] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4643 num_files=1 job=3 return_code=23
[2018-09-12 10:50:58.608525] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:51:00.625092] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4642 num_files=1 job=1 return_code=23
[2018-09-12 10:51:00.626614] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536749440']
[2018-09-12 10:51:02.913702] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4518 num_files=1 job=2 return_code=23
[2018-09-12 10:51:02.915240] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536749440']
[2018-09-12 10:51:03.147001] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-12 10:51:03.147419] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536749463.1473 DATA=0 XATT=0
[2018-09-12 10:51:03.148145] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1536749440 entry_stime=(1536747669, 0) changelog_start=1536749440 stime=(1536749439, 0) duration=21.2471 num_changelogs=1 mode=live_changelog
Post by Kotte, Christian (Ext)
ChangelogException: [Errno 2] No such file or directory
Please share the changelog log file to debug this. On the same node where you got this traceback, on same location, share the following log file
"changes-bricks-brick1-brick.log"
I deleted the changelogs directory in that case. Looks like you shouldn’t do it and the changelog data is also stored somewhere else!?
Post by Kotte, Christian (Ext)
How can I reset the changelog without deleting all data?
I didn't understand clearly what is the requirement here. Could you elaborate?
I can only get rid of those warnings and errors if I delete the gluster volume on the nodes including the .glusterfs directory. If I delete the changelogs inside .glusterfs, then I get the error mentioned above (ChangelogException: [Errno 2] No such file or directory).

Is there another way to start with a fresh geo-replication instead of deleting everything?

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com>
Date: Wednesday, 12. September 2018 at 07:31
To: "Kotte, Christian (Ext)" <***@novartis.com>
Cc: "gluster-***@gluster.org" <gluster-***@gluster.org>
Subject: Re: [Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue

Answer inline.

On Tue, Sep 11, 2018 at 4:19 PM, Kotte, Christian (Ext) <***@novartis.com<mailto:***@novartis.com>> wrote:
Hi all,

I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup. The gsyncd.log on the master is fine, but I have some strange changelog warnings and errors on the interimmaster:

gsyncd.log


[2018-09-11 10:38:35.575464] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time stime=(1536662250, 0)
[2018-09-11 10:38:37.126749] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4698 num_files=1 job=1 return_code=23
[2018-09-11 10:38:37.128668] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:39.353209] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4057 num_files=1 job=2 return_code=23
[2018-09-11 10:38:39.354737] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:41.501187] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4781 num_files=1 job=3 return_code=23
[2018-09-11 10:38:41.503048] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:43.575047] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4385 num_files=1 job=1 return_code=23
[2018-09-11 10:38:43.576597] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:45.838089] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4765 num_files=1 job=2 return_code=23
[2018-09-11 10:38:45.840205] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:47.969033] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4602 num_files=1 job=3 return_code=23
[2018-09-11 10:38:47.970118] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:50.54420] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4717 num_files=1 job=1 return_code=23
[2018-09-11 10:38:50.56072] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:52.317955] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4711 num_files=1 job=2 return_code=23
[2018-09-11 10:38:52.319642] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:54.448926] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4715 num_files=1 job=3 return_code=23
[2018-09-11 10:38:54.451127] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.538007] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4759 num_files=1 job=1 return_code=23
[2018-09-11 10:38:56.538914] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.544816] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-11 10:38:56.545031] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536662336.5450 DATA=0 XATT=0
[2018-09-11 10:38:56.545356] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1536662311 entry_stime=None changelog_start=1536662311 stime=(1536662310, 0) duration=20.9674 num_changelogs=1 mode=live_changelog

There seems to be a bug, please raise a bug. For now as a work around add the following line at the end on all the master node's configuration with
any editor. After adding it on all master nodes, stop and start geo-rep.

rsync-options = --ignore-missing-args

configuration file: /var/lib/glusterd/geo-replication/<mastervol>_<slave_node>_<slave/vol>gsyncd.conf


I had those issues in the past with 4.1.2 as well. I could fix it only by deleting the geo-replication and the gluster volume and re-create everything.

If I delete the geo-replication and delete the changelogs directory or the CHANGELOG files, I get this error:

gsyncd.log


[2018-09-11 10:26:44.928277] E [repce(agent /bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 61] No data available



Or

Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory

Please share the changelog log file to debug this. On the same node where you got this traceback, on same location, share the following log file
"changes-bricks-brick1-brick.log"


I read somewhere that if I delete the geo-replication with “reset-sync-time”, the changelogs are cleared, but this doesn’t happen.

changelogs are not cleared, but in the new geo-rep session, the old changelogs are not used for syncing.

How can I reset the changelog without deleting all data?

I didn't understand clearly what is the requirement here. Could you elaborate?
Regards,

Christian Kotte

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=zVKE4owXl2MJqjkOEtAywQY7sDA-0w4iVYz4wok3Krg&s=oq-U86I4eNBDsHswBSEgeMAxXKvo0ndqWoxEmefz8NU&e=>
--
Thanks and Regards,
Kotresh H R
Loading...