Discussion:
[Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work
Marcus Pedersén
2018-08-31 06:46:22 UTC
Permalink
Hi all,

I had problems with stopping sync after upgrade to 4.1.2.

I upgraded to 4.1.3 and it ran fine for one day, but now one of the master nodes shows faulty.

Most of the sync jobs have return code 23, how do I resolve this?

I see messages like:

_GMaster: Sucessfully fixed all entry ops with gfid mismatch

Will this resolve error code 23?

There is also a python error.

The python error was a selinux problem, turning off selinux made node go to active again.

See log below.


CentOS 7, installed through SIG Gluster (OS updated to latest at the same time)

Master cluster: 2 x (2 + 1) distributed, replicated

Client cluster: 1 x (2 + 1) replicated


Many thanks in advance!


Best regards

Marcus Pedersén



gsyncd.log from Faulty node:

[2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.8099 num_files=57 job=3 return_code=23
[2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0904 num_files=3 job=3 return_code=23
[2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=30069:139655665837888:1535696752.35 method=entry_ops error=OSError
[2018-08-31 06:25:52.562346] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied
[2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.637486] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-31 06:26:03.650330] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-31 06:26:05.296473] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.6457
[2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0985
[2018-08-31 06:26:06.397691] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-31 06:26:16.816423] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1535696776
[2018-08-31 06:26:16.888772] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-31 06:26:16.892049] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-31 06:26:16.892703] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
[2018-08-31 06:26:17.914803] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1525739167, 0)
[2018-08-31 06:26:18.521718] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1063 num_files=17 job=3 return_code=23
[2018-08-31 06:26:19.260137] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3441 num_files=34 job=1 return_code=23
[2018-08-31 06:26:19.615191] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0923 num_files=7 job=3 return_code=23
[2018-08-31 06:26:19.891227] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1302 num_files=12 job=1 return_code=23
[2018-08-31 06:26:19.922700] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.5024 num_files=50 job=2 return_code=23

[2018-08-31 06:26:21.639342] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=1.5233 num_files=5 job=3 return_code=23
[2018-08-31 06:26:22.12726] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1191 num_files=7 job=1 return_code=23
[2018-08-31 06:26:22.86136] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0731 num_files=4 job=1 return_code=23
[2018-08-31 06:26:22.503290] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0779 num_files=15 job=2 return_code=23
[2018-08-31 06:26:23.214704] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0738 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.251876] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2478 num_files=33 job=2 return_code=23
[2018-08-31 06:26:23.802699] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0873 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.828176] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0758 num_files=3 job=2 return_code=23
[2018-08-31 06:26:23.854063] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2662 num_files=34 job=1 return_code=23
[2018-08-31 06:26:24.403228] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0997 num_files=30 job=3 return_code=23
[2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3 return_code=23
[2018-08-31 06:26:25.438527] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0832 num_files=9 job=1 return_code=23
[2018-08-31 06:26:25.447256] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.6180 num_files=86 job=2 return_code=23
[2018-08-31 06:26:25.571913] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0706 num_files=2 job=3 return_code=0
[2018-08-31 06:26:27.21325] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0814 num_files=1 job=1 return_code=23
[2018-08-31 06:26:27.615520] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0933 num_files=13 job=1 return_code=23
[2018-08-31 06:26:27.668323] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2190 num_files=95 job=2 return_code=23
[2018-08-31 06:26:27.740139] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0716 num_files=11 job=2 return_code=23
[2018-08-31 06:26:28.191068] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1167 num_files=38 job=3 return_code=23
[2018-08-31 06:26:28.268213] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0768 num_files=7 job=3 return_code=23
[2018-08-31 06:26:28.317909] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0770 num_files=4 job=2 return_code=23
[2018-08-31 06:26:28.710064] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0932 num_files=23 job=1 return_code=23
[2018-08-31 06:26:28.907250] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0886 num_files=26 job=2 return_code=23
[2018-08-31 06:26:28.976679] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0692 num_files=4 job=2 return_code=23
[2018-08-31 06:26:29.55774] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0788 num_files=9 job=2 return_code=23
[2018-08-31 06:26:29.295576] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=16 job=1 return_code=23
[2018-08-31 06:26:29.665076] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1087 num_files=25 job=2 return_code=23
[2018-08-31 06:26:30.277998] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1122 num_files=40 job=2 return_code=23
[2018-08-31 06:26:31.153105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3822 num_files=74 job=3 return_code=23
[2018-08-31 06:26:31.227639] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0743 num_files=18 job=3 return_code=23
[2018-08-31 06:26:31.302660] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0748 num_files=18 job=3 return_code=23


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
Kotresh Hiremath Ravishankar
2018-08-31 09:09:13 UTC
Permalink
Hi Marcus,

Could you attach full logs? Is the same trace back happening repeatedly? It
will be helpful you attach the corresponding mount log as well.
What's the rsync version, you are using?

Thanks,
Kotresh HR

On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <***@slu.se>
wrote:

> Hi all,
>
> I had problems with stopping sync after upgrade to 4.1.2.
>
> I upgraded to 4.1.3 and it ran fine for one day, but now one of the master
> nodes shows faulty.
>
> Most of the sync jobs have return code 23, how do I resolve this?
>
> I see messages like:
>
> _GMaster: Sucessfully fixed all entry ops with gfid mismatch
>
> Will this resolve error code 23?
>
> There is also a python error.
>
> The python error was a selinux problem, turning off selinux made node go
> to active again.
>
> See log below.
>
>
> CentOS 7, installed through SIG Gluster (OS updated to latest at the same
> time)
>
> Master cluster: 2 x (2 + 1) distributed, replicated
>
> Client cluster: 1 x (2 + 1) replicated
>
>
> Many thanks in advance!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
> gsyncd.log from Faulty node:
>
> [2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.8099 num_files=57 job=3
> return_code=23
> [2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0904 num_files=3 job=3
> return_code=23
> [2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__]
> RepceClient: call failed call=30069:139655665837888:1535696752.35
> method=entry_ops error=OSError
> [2018-08-31 06:25:52.562346] E [syncdutils(worker
> /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in
> main
> func(args)
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in
> subcmd_worker
> local.service_loop(remote)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288,
> in service_loop
> g3.crawlwrap(oneshot=True)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in
> crawlwrap
> self.crawl()
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545,
> in crawl
> self.changelogs_batch_process(changes)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445,
> in changelogs_batch_process
> self.process(batch)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280,
> in process
> self.process_change(change, done, retry)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179,
> in process_change
> failures = self.slave.server.entry_ops(entries)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in
> __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in
> __call__
> raise res
> OSError: [Errno 13] Permission denied
> [2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
> [2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-31 06:26:03.637486] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-31 06:26:03.650330] I [resource(worker /urd-gds/gluster):1377:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-08-31 06:26:05.296473] I [resource(worker /urd-gds/gluster):1424:connect_remote]
> SSH: SSH connection between master and slave established.
> duration=1.6457
> [2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect]
> GLUSTER: Mounting gluster volume locally...
> [2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect]
> GLUSTER: Mounted gluster volume duration=1.0985
> [2018-08-31 06:26:06.397691] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker]
> <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register]
> _GMaster: Working dir path=/var/lib/misc/gluster/
> gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-31 06:26:16.816423] I [resource(worker /urd-gds/gluster):1282:service_loop]
> GLUSTER: Register time time=1535696776
> [2018-08-31 06:26:16.888772] I [gsyncdstatus(worker
> /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status
> Change status=Active
> [2018-08-31 06:26:16.892049] I [gsyncdstatus(worker
> /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status
> Change status=History Crawl
> [2018-08-31 06:26:16.892703] I [master(worker
> /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1
> stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
> [2018-08-31 06:26:17.914803] I [master(worker
> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
> stime=(1525739167, 0)
> [2018-08-31 06:26:18.521718] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1063 num_files=17 job=3
> return_code=23
> [2018-08-31 06:26:19.260137] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.3441 num_files=34 job=1
> return_code=23
> [2018-08-31 06:26:19.615191] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0923 num_files=7 job=3
> return_code=23
> [2018-08-31 06:26:19.891227] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1302 num_files=12 job=1
> return_code=23
> [2018-08-31 06:26:19.922700] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.5024 num_files=50 job=2
> return_code=23
>
> [2018-08-31 06:26:21.639342] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=1.5233 num_files=5 job=3
> return_code=23
> [2018-08-31 06:26:22.12726] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1191 num_files=7 job=1
> return_code=23
> [2018-08-31 06:26:22.86136] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0731 num_files=4 job=1
> return_code=23
> [2018-08-31 06:26:22.503290] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0779 num_files=15 job=2
> return_code=23
> [2018-08-31 06:26:23.214704] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0738 num_files=9 job=3
> return_code=23
> [2018-08-31 06:26:23.251876] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.2478 num_files=33 job=2
> return_code=23
> [2018-08-31 06:26:23.802699] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0873 num_files=9 job=3
> return_code=23
> [2018-08-31 06:26:23.828176] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0758 num_files=3 job=2
> return_code=23
> [2018-08-31 06:26:23.854063] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.2662 num_files=34 job=1
> return_code=23
> [2018-08-31 06:26:24.403228] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0997 num_files=30 job=3
> return_code=23
> [2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3
> return_code=23
> [2018-08-31 06:26:25.438527] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0832 num_files=9 job=1
> return_code=23
> [2018-08-31 06:26:25.447256] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.6180 num_files=86 job=2
> return_code=23
> [2018-08-31 06:26:25.571913] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0706 num_files=2 job=3
> return_code=0
> [2018-08-31 06:26:27.21325] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0814 num_files=1 job=1
> return_code=23
> [2018-08-31 06:26:27.615520] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0933 num_files=13 job=1
> return_code=23
> [2018-08-31 06:26:27.668323] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.2190 num_files=95 job=2
> return_code=23
> [2018-08-31 06:26:27.740139] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0716 num_files=11 job=2
> return_code=23
> [2018-08-31 06:26:28.191068] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1167 num_files=38 job=3
> return_code=23
> [2018-08-31 06:26:28.268213] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0768 num_files=7 job=3
> return_code=23
> [2018-08-31 06:26:28.317909] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0770 num_files=4 job=2
> return_code=23
> [2018-08-31 06:26:28.710064] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0932 num_files=23 job=1
> return_code=23
> [2018-08-31 06:26:28.907250] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0886 num_files=26 job=2
> return_code=23
> [2018-08-31 06:26:28.976679] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0692 num_files=4 job=2
> return_code=23
> [2018-08-31 06:26:29.55774] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0788 num_files=9 job=2
> return_code=23
> [2018-08-31 06:26:29.295576] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0847 num_files=16 job=1
> return_code=23
> [2018-08-31 06:26:29.665076] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1087 num_files=25 job=2
> return_code=23
> [2018-08-31 06:26:30.277998] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1122 num_files=40 job=2
> return_code=23
> [2018-08-31 06:26:31.153105] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.3822 num_files=74 job=3
> return_code=23
> [2018-08-31 06:26:31.227639] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0743 num_files=18 job=3
> return_code=23
> [2018-08-31 06:26:31.302660] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0748 num_files=18 job=3
> return_code=23
>
>
> ---
> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



--
Thanks and Regards,
Kotresh H R
Marcus Pedersén
2018-08-31 14:09:40 UTC
Permalink
I realy appologize, third try to make mail smaller.


/Marcus


________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 16:03
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Sorry, resend due to too large mail.


/Marcus

________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 15:19
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Hi Kotresh,

Please find attached logs, only logs from today.

The python error was repeated over and over again until I disabled selinux.

After that the node bacame active again.

The return code 23 seems to be repeated over and over again.


rsync version 3.1.2


Thanks a lot!


Best regards

Marcus


________________________________
Från: Kotresh Hiremath Ravishankar <***@redhat.com>
Skickat: den 31 augusti 2018 11:09
Till: Marcus Pedersén
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Could you attach full logs? Is the same trace back happening repeatedly? It will be helpful you attach the corresponding mount log as well.
What's the rsync version, you are using?

Thanks,
Kotresh HR

On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I had problems with stopping sync after upgrade to 4.1.2.

I upgraded to 4.1.3 and it ran fine for one day, but now one of the master nodes shows faulty.

Most of the sync jobs have return code 23, how do I resolve this?

I see messages like:

_GMaster: Sucessfully fixed all entry ops with gfid mismatch

Will this resolve error code 23?

There is also a python error.

The python error was a selinux problem, turning off selinux made node go to active again.

See log below.


CentOS 7, installed through SIG Gluster (OS updated to latest at the same time)

Master cluster: 2 x (2 + 1) distributed, replicated

Client cluster: 1 x (2 + 1) replicated


Many thanks in advance!


Best regards

Marcus Pedersén



gsyncd.log from Faulty node:

[2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.8099 num_files=57 job=3 return_code=23
[2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0904 num_files=3 job=3 return_code=23
[2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=30069:139655665837888:1535696752.35 method=entry_ops error=OSError
[2018-08-31 06:25:52.562346] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied
[2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.637486] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-31 06:26:03.650330] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-31 06:26:05.296473] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.6457
[2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0985
[2018-08-31 06:26:06.397691] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-31 06:26:16.816423] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1535696776
[2018-08-31 06:26:16.888772] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-31 06:26:16.892049] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-31 06:26:16.892703] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
[2018-08-31 06:26:17.914803] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1525739167, 0)
[2018-08-31 06:26:18.521718] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1063 num_files=17 job=3 return_code=23
[2018-08-31 06:26:19.260137] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3441 num_files=34 job=1 return_code=23
[2018-08-31 06:26:19.615191] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0923 num_files=7 job=3 return_code=23
[2018-08-31 06:26:19.891227] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1302 num_files=12 job=1 return_code=23
[2018-08-31 06:26:19.922700] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.5024 num_files=50 job=2 return_code=23

[2018-08-31 06:26:21.639342] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=1.5233 num_files=5 job=3 return_code=23
[2018-08-31 06:26:22.12726] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1191 num_files=7 job=1 return_code=23
[2018-08-31 06:26:22.86136] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0731 num_files=4 job=1 return_code=23
[2018-08-31 06:26:22.503290] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0779 num_files=15 job=2 return_code=23
[2018-08-31 06:26:23.214704] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0738 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.251876] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2478 num_files=33 job=2 return_code=23
[2018-08-31 06:26:23.802699] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0873 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.828176] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0758 num_files=3 job=2 return_code=23
[2018-08-31 06:26:23.854063] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2662 num_files=34 job=1 return_code=23
[2018-08-31 06:26:24.403228] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0997 num_files=30 job=3 return_code=23
[2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3 return_code=23
[2018-08-31 06:26:25.438527] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0832 num_files=9 job=1 return_code=23
[2018-08-31 06:26:25.447256] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.6180 num_files=86 job=2 return_code=23
[2018-08-31 06:26:25.571913] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0706 num_files=2 job=3 return_code=0
[2018-08-31 06:26:27.21325] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0814 num_files=1 job=1 return_code=23
[2018-08-31 06:26:27.615520] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0933 num_files=13 job=1 return_code=23
[2018-08-31 06:26:27.668323] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2190 num_files=95 job=2 return_code=23
[2018-08-31 06:26:27.740139] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0716 num_files=11 job=2 return_code=23
[2018-08-31 06:26:28.191068] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1167 num_files=38 job=3 return_code=23
[2018-08-31 06:26:28.268213] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0768 num_files=7 job=3 return_code=23
[2018-08-31 06:26:28.317909] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0770 num_files=4 job=2 return_code=23
[2018-08-31 06:26:28.710064] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0932 num_files=23 job=1 return_code=23
[2018-08-31 06:26:28.907250] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0886 num_files=26 job=2 return_code=23
[2018-08-31 06:26:28.976679] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0692 num_files=4 job=2 return_code=23
[2018-08-31 06:26:29.55774] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0788 num_files=9 job=2 return_code=23
[2018-08-31 06:26:29.295576] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=16 job=1 return_code=23
[2018-08-31 06:26:29.665076] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1087 num_files=25 job=2 return_code=23
[2018-08-31 06:26:30.277998] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1122 num_files=40 job=2 return_code=23
[2018-08-31 06:26:31.153105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3822 num_files=74 job=3 return_code=23
[2018-08-31 06:26:31.227639] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0743 num_files=18 job=3 return_code=23
[2018-08-31 06:26:31.302660] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0748 num_files=18 job=3 return_code=23


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
Marcus Pedersén
2018-09-01 11:38:45 UTC
Permalink
Hi again,

I found another problem on the other master node.

The node toggles Active/Faulty and it is the same error over and over again.


[2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=1226:139955262510912:1535800981.24 method=entry_ops error=GsyncdError
[2018-09-01 11:23:02.94214] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-01 11:23:02.124444] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty


I have also found a python error as well, I have only seen this once though.


[2018-09-01 11:16:45.907660] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1524101534, 0)
[2018-09-01 11:16:47.364109] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 362, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939, in syncjob
po = self.sync_engine(pb, self.log_err)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1442, in rsync
rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000


I attach the logs as well.


Many thanks!


Best regards

Marcus Pedersén




________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 31 augusti 2018 16:09
Till: ***@redhat.com
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


I realy appologize, third try to make mail smaller.


/Marcus


________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 16:03
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Sorry, resend due to too large mail.


/Marcus

________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 15:19
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Hi Kotresh,

Please find attached logs, only logs from today.

The python error was repeated over and over again until I disabled selinux.

After that the node bacame active again.

The return code 23 seems to be repeated over and over again.


rsync version 3.1.2


Thanks a lot!


Best regards

Marcus


________________________________
Från: Kotresh Hiremath Ravishankar <***@redhat.com>
Skickat: den 31 augusti 2018 11:09
Till: Marcus Pedersén
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Could you attach full logs? Is the same trace back happening repeatedly? It will be helpful you attach the corresponding mount log as well.
What's the rsync version, you are using?

Thanks,
Kotresh HR

On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I had problems with stopping sync after upgrade to 4.1.2.

I upgraded to 4.1.3 and it ran fine for one day, but now one of the master nodes shows faulty.

Most of the sync jobs have return code 23, how do I resolve this?

I see messages like:

_GMaster: Sucessfully fixed all entry ops with gfid mismatch

Will this resolve error code 23?

There is also a python error.

The python error was a selinux problem, turning off selinux made node go to active again.

See log below.


CentOS 7, installed through SIG Gluster (OS updated to latest at the same time)

Master cluster: 2 x (2 + 1) distributed, replicated

Client cluster: 1 x (2 + 1) replicated


Many thanks in advance!


Best regards

Marcus Pedersén



gsyncd.log from Faulty node:

[2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.8099 num_files=57 job=3 return_code=23
[2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0904 num_files=3 job=3 return_code=23
[2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=30069:139655665837888:1535696752.35 method=entry_ops error=OSError
[2018-08-31 06:25:52.562346] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied
[2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.637486] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-31 06:26:03.650330] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-31 06:26:05.296473] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.6457
[2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0985
[2018-08-31 06:26:06.397691] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-31 06:26:16.816423] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1535696776
[2018-08-31 06:26:16.888772] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-31 06:26:16.892049] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-31 06:26:16.892703] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
[2018-08-31 06:26:17.914803] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1525739167, 0)
[2018-08-31 06:26:18.521718] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1063 num_files=17 job=3 return_code=23
[2018-08-31 06:26:19.260137] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3441 num_files=34 job=1 return_code=23
[2018-08-31 06:26:19.615191] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0923 num_files=7 job=3 return_code=23
[2018-08-31 06:26:19.891227] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1302 num_files=12 job=1 return_code=23
[2018-08-31 06:26:19.922700] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.5024 num_files=50 job=2 return_code=23

[2018-08-31 06:26:21.639342] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=1.5233 num_files=5 job=3 return_code=23
[2018-08-31 06:26:22.12726] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1191 num_files=7 job=1 return_code=23
[2018-08-31 06:26:22.86136] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0731 num_files=4 job=1 return_code=23
[2018-08-31 06:26:22.503290] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0779 num_files=15 job=2 return_code=23
[2018-08-31 06:26:23.214704] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0738 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.251876] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2478 num_files=33 job=2 return_code=23
[2018-08-31 06:26:23.802699] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0873 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.828176] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0758 num_files=3 job=2 return_code=23
[2018-08-31 06:26:23.854063] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2662 num_files=34 job=1 return_code=23
[2018-08-31 06:26:24.403228] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0997 num_files=30 job=3 return_code=23
[2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3 return_code=23
[2018-08-31 06:26:25.438527] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0832 num_files=9 job=1 return_code=23
[2018-08-31 06:26:25.447256] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.6180 num_files=86 job=2 return_code=23
[2018-08-31 06:26:25.571913] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0706 num_files=2 job=3 return_code=0
[2018-08-31 06:26:27.21325] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0814 num_files=1 job=1 return_code=23
[2018-08-31 06:26:27.615520] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0933 num_files=13 job=1 return_code=23
[2018-08-31 06:26:27.668323] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2190 num_files=95 job=2 return_code=23
[2018-08-31 06:26:27.740139] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0716 num_files=11 job=2 return_code=23
[2018-08-31 06:26:28.191068] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1167 num_files=38 job=3 return_code=23
[2018-08-31 06:26:28.268213] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0768 num_files=7 job=3 return_code=23
[2018-08-31 06:26:28.317909] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0770 num_files=4 job=2 return_code=23
[2018-08-31 06:26:28.710064] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0932 num_files=23 job=1 return_code=23
[2018-08-31 06:26:28.907250] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0886 num_files=26 job=2 return_code=23
[2018-08-31 06:26:28.976679] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0692 num_files=4 job=2 return_code=23
[2018-08-31 06:26:29.55774] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0788 num_files=9 job=2 return_code=23
[2018-08-31 06:26:29.295576] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=16 job=1 return_code=23
[2018-08-31 06:26:29.665076] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1087 num_files=25 job=2 return_code=23
[2018-08-31 06:26:30.277998] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1122 num_files=40 job=2 return_code=23
[2018-08-31 06:26:31.153105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3822 num_files=74 job=3 return_code=23
[2018-08-31 06:26:31.227639] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0743 num_files=18 job=3 return_code=23
[2018-08-31 06:26:31.302660] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0748 num_files=18 job=3 return_code=23


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
Kotresh Hiremath Ravishankar
2018-09-03 05:58:16 UTC
Permalink
Hi Marcus,

Geo-rep had few important fixes in 4.1.3. Is it possible to upgrade and
check whether the issue is still seen?

Thanks,
Kotresh HR

On Sat, Sep 1, 2018 at 5:08 PM, Marcus Pedersén <***@slu.se>
wrote:

> Hi again,
>
> I found another problem on the other master node.
>
> The node toggles Active/Faulty and it is the same error over and over
> again.
>
>
> [2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__]
> RepceClient: call failed call=1226:139955262510912:1535800981.24
> method=entry_ops error=GsyncdError
> [2018-09-01 11:23:02.94214] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception]
> <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-01 11:23:02.124444] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
>
>
> I have also found a python error as well, I have only seen this once
> though.
>
>
> [2018-09-01 11:16:45.907660] I [master(worker
> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
> stime=(1524101534, 0)
> [2018-09-01 11:16:47.364109] E [syncdutils(worker
> /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 362, in twrap
> tf(*aargs)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939,
> in syncjob
> po = self.sync_engine(pb, self.log_err)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1442,
> in rsync
> rconf.ssh_ctl_args + \
> AttributeError: 'NoneType' object has no attribute 'split'
> [2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
> [2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>
>
> I attach the logs as well.
>
>
> Many thanks!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> ------------------------------
> *Från:* gluster-users-***@gluster.org <gluster-users-bounces@
> gluster.org> för Marcus Pedersén <***@slu.se>
> *Skickat:* den 31 augusti 2018 16:09
> *Till:* ***@redhat.com
>
> *Kopia:* gluster-***@gluster.org
> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> I realy appologize, third try to make mail smaller.
>
>
> /Marcus
>
>
> ------------------------------
> *Från:* Marcus Pedersén
> *Skickat:* den 31 augusti 2018 16:03
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-***@gluster.org
> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> Sorry, resend due to too large mail.
>
>
> /Marcus
> ------------------------------
> *Från:* Marcus Pedersén
> *Skickat:* den 31 augusti 2018 15:19
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-***@gluster.org
> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> Hi Kotresh,
>
> Please find attached logs, only logs from today.
>
> The python error was repeated over and over again until I disabled selinux.
>
> After that the node bacame active again.
>
> The return code 23 seems to be repeated over and over again.
>
>
> rsync version 3.1.2
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus
>
>
> ------------------------------
> *Från:* Kotresh Hiremath Ravishankar <***@redhat.com>
> *Skickat:* den 31 augusti 2018 11:09
> *Till:* Marcus Pedersén
> *Kopia:* gluster-***@gluster.org
> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
> Hi Marcus,
>
> Could you attach full logs? Is the same trace back happening repeatedly?
> It will be helpful you attach the corresponding mount log as well.
> What's the rsync version, you are using?
>
> Thanks,
> Kotresh HR
>
> On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <***@slu.se>
> wrote:
>
>> Hi all,
>>
>> I had problems with stopping sync after upgrade to 4.1.2.
>>
>> I upgraded to 4.1.3 and it ran fine for one day, but now one of the
>> master nodes shows faulty.
>>
>> Most of the sync jobs have return code 23, how do I resolve this?
>>
>> I see messages like:
>>
>> _GMaster: Sucessfully fixed all entry ops with gfid mismatch
>>
>> Will this resolve error code 23?
>>
>> There is also a python error.
>>
>> The python error was a selinux problem, turning off selinux made node go
>> to active again.
>>
>> See log below.
>>
>>
>> CentOS 7, installed through SIG Gluster (OS updated to latest at the same
>> time)
>>
>> Master cluster: 2 x (2 + 1) distributed, replicated
>>
>> Client cluster: 1 x (2 + 1) replicated
>>
>>
>> Many thanks in advance!
>>
>>
>> Best regards
>>
>> Marcus Pedersén
>>
>>
>>
>> gsyncd.log from Faulty node:
>>
>> [2018-08-31 06:25:51.375267] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.8099 num_files=57 job=3 return_code=23
>> [2018-08-31 06:25:51.465895] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0904 num_files=3 job=3 return_code=23
>> [2018-08-31 06:25:52.562107] E [repce(worker
>> /urd-gds/gluster):197:__call__] RepceClient: call failed
>> call=30069:139655665837888:1535696752.35 method=entry_ops
>> error=OSError
>> [2018-08-31 06:25:52.562346] E [syncdutils(worker
>> /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
>> Traceback (most recent call last):
>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311,
>> in main
>> func(args)
>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72,
>> in subcmd_worker
>> local.service_loop(remote)
>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>> 1288, in service_loop
>> g3.crawlwrap(oneshot=True)
>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615,
>> in crawlwrap
>> self.crawl()
>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545,
>> in crawl
>> self.changelogs_batch_process(changes)
>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445,
>> in changelogs_batch_process
>> self.process(batch)
>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280,
>> in process
>> self.process_change(change, done, retry)
>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179,
>> in process_change
>> failures = self.slave.server.entry_ops(entries)
>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in
>> __call__
>> return self.ins(self.meth, *a)
>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in
>> __call__
>> raise res
>> OSError: [Errno 13] Permission denied
>> [2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor:
>> worker died in startup phase brick=/urd-gds/gluster
>> [2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status]
>> GeorepStatus: Worker Status Change status=Faulty
>> [2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>> [2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-08-31 06:26:03.637486] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-08-31 06:26:03.650330] I [resource(worker
>> /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-08-31 06:26:05.296473] I [resource(worker
>> /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between
>> master and slave established. duration=1.6457
>> [2018-08-31 06:26:05.297904] I [resource(worker
>> /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>> [2018-08-31 06:26:06.396939] I [resource(worker
>> /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume
>> duration=1.0985
>> [2018-08-31 06:26:06.397691] I [subcmds(worker
>> /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
>> Acknowledging back to monitor
>> [2018-08-31 06:26:16.815566] I [master(worker
>> /urd-gds/gluster):1593:register] _GMaster: Working dir
>> path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo
>> -001_urd-gds-volume/urd-gds-gluster
>> [2018-08-31 06:26:16.816423] I [resource(worker
>> /urd-gds/gluster):1282:service_loop] GLUSTER: Register time
>> time=1535696776
>> [2018-08-31 06:26:16.888772] I [gsyncdstatus(worker
>> /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status
>> Change status=Active
>> [2018-08-31 06:26:16.892049] I [gsyncdstatus(worker
>> /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl
>> Status Change status=History Crawl
>> [2018-08-31 06:26:16.892703] I [master(worker
>> /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1
>> stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
>> [2018-08-31 06:26:17.914803] I [master(worker
>> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
>> stime=(1525739167, 0)
>> [2018-08-31 06:26:18.521718] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1063 num_files=17 job=3 return_code=23
>> [2018-08-31 06:26:19.260137] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.3441 num_files=34 job=1 return_code=23
>> [2018-08-31 06:26:19.615191] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0923 num_files=7 job=3 return_code=23
>> [2018-08-31 06:26:19.891227] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1302 num_files=12 job=1 return_code=23
>> [2018-08-31 06:26:19.922700] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.5024 num_files=50 job=2 return_code=23
>>
>> [2018-08-31 06:26:21.639342] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=1.5233 num_files=5 job=3 return_code=23
>> [2018-08-31 06:26:22.12726] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1191 num_files=7 job=1 return_code=23
>> [2018-08-31 06:26:22.86136] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0731 num_files=4 job=1 return_code=23
>> [2018-08-31 06:26:22.503290] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0779 num_files=15 job=2 return_code=23
>> [2018-08-31 06:26:23.214704] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0738 num_files=9 job=3 return_code=23
>> [2018-08-31 06:26:23.251876] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.2478 num_files=33 job=2 return_code=23
>> [2018-08-31 06:26:23.802699] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0873 num_files=9 job=3 return_code=23
>> [2018-08-31 06:26:23.828176] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0758 num_files=3 job=2 return_code=23
>> [2018-08-31 06:26:23.854063] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.2662 num_files=34 job=1 return_code=23
>> [2018-08-31 06:26:24.403228] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0997 num_files=30 job=3 return_code=23
>> [2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob]
>> Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3
>> return_code=23
>> [2018-08-31 06:26:25.438527] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0832 num_files=9 job=1 return_code=23
>> [2018-08-31 06:26:25.447256] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.6180 num_files=86 job=2 return_code=23
>> [2018-08-31 06:26:25.571913] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0706 num_files=2 job=3 return_code=0
>> [2018-08-31 06:26:27.21325] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0814 num_files=1 job=1 return_code=23
>> [2018-08-31 06:26:27.615520] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0933 num_files=13 job=1 return_code=23
>> [2018-08-31 06:26:27.668323] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.2190 num_files=95 job=2 return_code=23
>> [2018-08-31 06:26:27.740139] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0716 num_files=11 job=2 return_code=23
>> [2018-08-31 06:26:28.191068] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1167 num_files=38 job=3 return_code=23
>> [2018-08-31 06:26:28.268213] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0768 num_files=7 job=3 return_code=23
>> [2018-08-31 06:26:28.317909] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0770 num_files=4 job=2 return_code=23
>> [2018-08-31 06:26:28.710064] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0932 num_files=23 job=1 return_code=23
>> [2018-08-31 06:26:28.907250] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0886 num_files=26 job=2 return_code=23
>> [2018-08-31 06:26:28.976679] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0692 num_files=4 job=2 return_code=23
>> [2018-08-31 06:26:29.55774] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0788 num_files=9 job=2 return_code=23
>> [2018-08-31 06:26:29.295576] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0847 num_files=16 job=1 return_code=23
>> [2018-08-31 06:26:29.665076] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1087 num_files=25 job=2 return_code=23
>> [2018-08-31 06:26:30.277998] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1122 num_files=40 job=2 return_code=23
>> [2018-08-31 06:26:31.153105] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.3822 num_files=74 job=3 return_code=23
>> [2018-08-31 06:26:31.227639] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0743 num_files=18 job=3 return_code=23
>> [2018-08-31 06:26:31.302660] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0748 num_files=18 job=3 return_code=23
>>
>>
>> ---
>> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
>> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-***@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> ---
> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>



--
Thanks and Regards,
Kotresh H R
Marcus Pedersén
2018-09-10 11:13:48 UTC
Permalink
Hi Kotresh,

I have been running 4.1.3 from the end of August.

Since then data has been synced to geo side with a couple of hundred GB per 24 hour, even with the errors I have reported in this thread.


Four days ago all data transfer to geo side stopped, and the logs repeats the same error over and over again (see below).

Both nodes toggle status Active/Faulty.


Thanks alot!


Best regards

Marcus


One master node, gsyncd.log:

[2018-09-10 10:53:38.409709] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 10:53:47.783914] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:47.852792] I [gsyncd(status):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.421061] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-09-10 10:53:48.462655] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.463366] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-09-10 10:53:48.465905] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.474558] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-10 10:53:50.70219] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.5954
[2018-09-10 10:53:50.70777] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-10 10:53:51.170597] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0994
[2018-09-10 10:53:51.171158] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-10 10:53:51.696057] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:51.764605] I [gsyncd(status):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:53.210553] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-09-10 10:53:53.211148] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1536576833
[2018-09-10 10:53:53.230945] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-10 10:53:53.233444] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-10 10:53:53.233632] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1524272046, 0) entry_stime=(1524271940, 0) etime=1536576833
[2018-09-10 10:53:53.234951] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1524272046, 0)
[2018-09-10 10:53:53.762105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0856 num_files=1 job=1 return_code=0
[2018-09-10 10:53:54.437858] I [master(worker /urd-gds/gluster):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-10 10:53:54.437973] I [master(worker /urd-gds/gluster):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1.1979 DATA=1 XATT=0
[2018-09-10 10:53:54.438153] I [master(worker /urd-gds/gluster):1394:process] _GMaster: Batch Completed changelog_end=1524272047 entry_stime=(1524271940, 0) changelog_start=1524272047 stime=(152\
4272046, 0) duration=1.2029 num_changelogs=1 mode=history_changelog
[2018-09-10 10:53:54.482408] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1524272046, 0)
[2018-09-10 10:53:54.583467] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=1844:139973681694528:1536576834.54 method=entry_ops error=GsyncdError
[2018-09-10 10:53:54.583585] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-09-10 10:53:54.600353] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-10 10:53:55.175978] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-09-10 10:53:55.182988] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 10:53:56.24414] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf



Other master node, gsyncd.log:

[2018-09-10 11:10:43.10458] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 11:10:53.28702] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-09-10 11:10:53.69638] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 11:10:53.70264] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-09-10 11:10:53.71902] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 11:10:53.80737] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-10 11:10:54.621948] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.5410
[2018-09-10 11:10:54.622504] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-10 11:10:55.721349] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0984
[2018-09-10 11:10:55.721913] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-10 11:10:58.543606] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-09-10 11:10:58.545701] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1536577858
[2018-09-10 11:10:58.564208] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-10 11:10:58.565689] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-10 11:10:58.565876] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1527128725, 0) entry_stime=(1527128815, 0) etime=1536577858
[2018-09-10 11:10:59.593652] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1527128725, 0)
[2018-09-10 11:11:01.755116] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.5233 num_files=103 job=1 return_code=0
[2018-09-10 11:11:02.897665] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.6648 num_files=116 job=2 return_code=0
[2018-09-10 11:11:03.98150] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2003 num_files=59 job=2 return_code=23
[2018-09-10 11:11:03.219059] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1207 num_files=16 job=2 return_code=0
[2018-09-10 11:11:03.841105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1212 num_files=32 job=2 return_code=23
[2018-09-10 11:11:04.951658] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2160 num_files=24 job=3 return_code=0
[2018-09-10 11:11:05.2938] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=2935:140696531339072:1536577864.67 method=entry_ops error=GsyncdError
[2018-09-10 11:11:05.3125] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-09-10 11:11:05.17061] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-10 11:11:05.733716] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-09-10 11:11:05.768186] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 11:11:15.788830] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-09-10 11:11:15.829871] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf




________________________________
Från: Kotresh Hiremath Ravishankar <***@redhat.com>
Skickat: den 3 september 2018 07:58
Till: Marcus Pedersén
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Geo-rep had few important fixes in 4.1.3. Is it possible to upgrade and check whether the issue is still seen?

Thanks,
Kotresh HR

On Sat, Sep 1, 2018 at 5:08 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi again,

I found another problem on the other master node.

The node toggles Active/Faulty and it is the same error over and over again.


[2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=1226:139955262510912:1535800981.24 method=entry_ops error=GsyncdError
[2018-09-01 11:23:02.94214] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-01 11:23:02.124444] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty


I have also found a python error as well, I have only seen this once though.


[2018-09-01 11:16:45.907660] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1524101534, 0)
[2018-09-01 11:16:47.364109] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 362, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939, in syncjob
po = self.sync_engine(pb, self.log_err)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1442, in rsync
rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000


I attach the logs as well.


Many thanks!


Best regards

Marcus Pedersén




________________________________
Från: gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org> <gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org>> för Marcus Pedersén <***@slu.se<mailto:***@slu.se>>
Skickat: den 31 augusti 2018 16:09
Till: ***@redhat.com<mailto:***@redhat.com>

Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


I realy appologize, third try to make mail smaller.


/Marcus


________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 16:03
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Sorry, resend due to too large mail.


/Marcus

________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 15:19
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Hi Kotresh,

Please find attached logs, only logs from today.

The python error was repeated over and over again until I disabled selinux.

After that the node bacame active again.

The return code 23 seems to be repeated over and over again.


rsync version 3.1.2


Thanks a lot!


Best regards

Marcus


________________________________
Från: Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>
Skickat: den 31 augusti 2018 11:09
Till: Marcus Pedersén
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Could you attach full logs? Is the same trace back happening repeatedly? It will be helpful you attach the corresponding mount log as well.
What's the rsync version, you are using?

Thanks,
Kotresh HR

On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I had problems with stopping sync after upgrade to 4.1.2.

I upgraded to 4.1.3 and it ran fine for one day, but now one of the master nodes shows faulty.

Most of the sync jobs have return code 23, how do I resolve this?

I see messages like:

_GMaster: Sucessfully fixed all entry ops with gfid mismatch

Will this resolve error code 23?

There is also a python error.

The python error was a selinux problem, turning off selinux made node go to active again.

See log below.


CentOS 7, installed through SIG Gluster (OS updated to latest at the same time)

Master cluster: 2 x (2 + 1) distributed, replicated

Client cluster: 1 x (2 + 1) replicated


Many thanks in advance!


Best regards

Marcus Pedersén



gsyncd.log from Faulty node:

[2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.8099 num_files=57 job=3 return_code=23
[2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0904 num_files=3 job=3 return_code=23
[2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=30069:139655665837888:1535696752.35 method=entry_ops error=OSError
[2018-08-31 06:25:52.562346] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied
[2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.637486] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-31 06:26:03.650330] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-31 06:26:05.296473] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.6457
[2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0985
[2018-08-31 06:26:06.397691] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-31 06:26:16.816423] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1535696776
[2018-08-31 06:26:16.888772] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-31 06:26:16.892049] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-31 06:26:16.892703] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
[2018-08-31 06:26:17.914803] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1525739167, 0)
[2018-08-31 06:26:18.521718] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1063 num_files=17 job=3 return_code=23
[2018-08-31 06:26:19.260137] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3441 num_files=34 job=1 return_code=23
[2018-08-31 06:26:19.615191] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0923 num_files=7 job=3 return_code=23
[2018-08-31 06:26:19.891227] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1302 num_files=12 job=1 return_code=23
[2018-08-31 06:26:19.922700] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.5024 num_files=50 job=2 return_code=23

[2018-08-31 06:26:21.639342] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=1.5233 num_files=5 job=3 return_code=23
[2018-08-31 06:26:22.12726] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1191 num_files=7 job=1 return_code=23
[2018-08-31 06:26:22.86136] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0731 num_files=4 job=1 return_code=23
[2018-08-31 06:26:22.503290] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0779 num_files=15 job=2 return_code=23
[2018-08-31 06:26:23.214704] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0738 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.251876] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2478 num_files=33 job=2 return_code=23
[2018-08-31 06:26:23.802699] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0873 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.828176] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0758 num_files=3 job=2 return_code=23
[2018-08-31 06:26:23.854063] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2662 num_files=34 job=1 return_code=23
[2018-08-31 06:26:24.403228] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0997 num_files=30 job=3 return_code=23
[2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3 return_code=23
[2018-08-31 06:26:25.438527] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0832 num_files=9 job=1 return_code=23
[2018-08-31 06:26:25.447256] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.6180 num_files=86 job=2 return_code=23
[2018-08-31 06:26:25.571913] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0706 num_files=2 job=3 return_code=0
[2018-08-31 06:26:27.21325] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0814 num_files=1 job=1 return_code=23
[2018-08-31 06:26:27.615520] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0933 num_files=13 job=1 return_code=23
[2018-08-31 06:26:27.668323] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2190 num_files=95 job=2 return_code=23
[2018-08-31 06:26:27.740139] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0716 num_files=11 job=2 return_code=23
[2018-08-31 06:26:28.191068] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1167 num_files=38 job=3 return_code=23
[2018-08-31 06:26:28.268213] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0768 num_files=7 job=3 return_code=23
[2018-08-31 06:26:28.317909] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0770 num_files=4 job=2 return_code=23
[2018-08-31 06:26:28.710064] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0932 num_files=23 job=1 return_code=23
[2018-08-31 06:26:28.907250] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0886 num_files=26 job=2 return_code=23
[2018-08-31 06:26:28.976679] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0692 num_files=4 job=2 return_code=23
[2018-08-31 06:26:29.55774] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0788 num_files=9 job=2 return_code=23
[2018-08-31 06:26:29.295576] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=16 job=1 return_code=23
[2018-08-31 06:26:29.665076] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1087 num_files=25 job=2 return_code=23
[2018-08-31 06:26:30.277998] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1122 num_files=40 job=2 return_code=23
[2018-08-31 06:26:31.153105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3822 num_files=74 job=3 return_code=23
[2018-08-31 06:26:31.227639] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0743 num_files=18 job=3 return_code=23
[2018-08-31 06:26:31.302660] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0748 num_files=18 job=3 return_code=23


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>



--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
Milind Changire
2018-09-11 06:56:55 UTC
Permalink
Marcus,
Is it possible to send over the SELinux errors that you encountered before
turning it off ?
We could inspect and get the SELinux issues fixed as an aside.


On Mon, Sep 10, 2018 at 4:43 PM, Marcus Pedersén <***@slu.se>
wrote:

> Hi Kotresh,
>
> I have been running 4.1.3 from the end of August.
>
> Since then data has been synced to geo side with a couple of hundred GB
> per 24 hour, even with the errors I have reported in this thread.
>
>
> Four days ago all data transfer to geo side stopped, and the logs repeats
> the same error over and over again (see below).
>
> Both nodes toggle status Active/Faulty.
>
>
> Thanks alot!
>
>
> Best regards
>
> Marcus
>
>
> One master node, gsyncd.log:
>
> [2018-09-10 10:53:38.409709] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-09-10 10:53:47.783914] I [gsyncd(config-get):297:main] <top>: Using
> session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 10:53:47.852792] I [gsyncd(status):297:main] <top>: Using
> session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 10:53:48.421061] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-09-10 10:53:48.462655] I [gsyncd(agent /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 10:53:48.463366] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-09-10 10:53:48.465905] I [gsyncd(worker /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 10:53:48.474558] I [resource(worker /urd-gds/gluster):1377:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-09-10 10:53:50.70219] I [resource(worker /urd-gds/gluster):1424:connect_remote]
> SSH: SSH connection between master and slave established. duration=1.5954
> [2018-09-10 10:53:50.70777] I [resource(worker /urd-gds/gluster):1096:connect]
> GLUSTER: Mounting gluster volume locally...
> [2018-09-10 10:53:51.170597] I [resource(worker /urd-gds/gluster):1119:connect]
> GLUSTER: Mounted gluster volume duration=1.0994
> [2018-09-10 10:53:51.171158] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker]
> <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-09-10 10:53:51.696057] I [gsyncd(config-get):297:main] <top>: Using
> session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 10:53:51.764605] I [gsyncd(status):297:main] <top>: Using
> session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 10:53:53.210553] I [master(worker /urd-gds/gluster):1593:register]
> _GMaster: Working dir path=/var/lib/misc/gluster/
> gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-09-10 10:53:53.211148] I [resource(worker /urd-gds/gluster):1282:service_loop]
> GLUSTER: Register time time=1536576833
> [2018-09-10 10:53:53.230945] I [gsyncdstatus(worker
> /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status
> Change status=Active
> [2018-09-10 10:53:53.233444] I [gsyncdstatus(worker
> /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status
> Change status=History Crawl
> [2018-09-10 10:53:53.233632] I [master(worker
> /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1
> stime=(1524272046, 0) entry_stime=(1524271940, 0) etime=1536576833
> [2018-09-10 10:53:53.234951] I [master(worker
> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
> stime=(1524272046, 0)
> [2018-09-10 10:53:53.762105] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.0856 num_files=1 job=1
> return_code=0
> [2018-09-10 10:53:54.437858] I [master(worker /urd-gds/gluster):1374:process]
> _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0
> RMD=0 CRE=0 duration=0.0000 UNL=0
> [2018-09-10 10:53:54.437973] I [master(worker /urd-gds/gluster):1384:process]
> _GMaster: Data/Metadata Time Taken SETA=0 SETX=0
> meta_duration=0.0000 data_duration=1.1979 DATA=1 XATT=0
> [2018-09-10 10:53:54.438153] I [master(worker /urd-gds/gluster):1394:process]
> _GMaster: Batch Completed changelog_end=1524272047
> entry_stime=(1524271940, 0) changelog_start=1524272047
> stime=(152\
> 4272046, 0) duration=1.2029 num_changelogs=1
> mode=history_changelog
> [2018-09-10 10:53:54.482408] I [master(worker
> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
> stime=(1524272046, 0)
> [2018-09-10 10:53:54.583467] E [repce(worker /urd-gds/gluster):197:__call__]
> RepceClient: call failed call=1844:139973681694528:1536576834.54
> method=entry_ops error=GsyncdError
> [2018-09-10 10:53:54.583585] E [syncdutils(worker
> /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster"
> failed with ENOENT (No such file or directory)
> [2018-09-10 10:53:54.600353] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-10 10:53:55.175978] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
> [2018-09-10 10:53:55.182988] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-09-10 10:53:56.24414] I [gsyncd(config-get):297:main] <top>: Using
> session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>
>
>
> Other master node, gsyncd.log:
>
> [2018-09-10 11:10:43.10458] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-09-10 11:10:53.28702] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-09-10 11:10:53.69638] I [gsyncd(agent /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 11:10:53.70264] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-09-10 11:10:53.71902] I [gsyncd(worker /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-09-10 11:10:53.80737] I [resource(worker /urd-gds/gluster):1377:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-09-10 11:10:54.621948] I [resource(worker /urd-gds/gluster):1424:connect_remote]
> SSH: SSH connection between master and slave established.
> duration=1.5410
> [2018-09-10 11:10:54.622504] I [resource(worker /urd-gds/gluster):1096:connect]
> GLUSTER: Mounting gluster volume locally...
> [2018-09-10 11:10:55.721349] I [resource(worker /urd-gds/gluster):1119:connect]
> GLUSTER: Mounted gluster volume duration=1.0984
> [2018-09-10 11:10:55.721913] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker]
> <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-09-10 11:10:58.543606] I [master(worker /urd-gds/gluster):1593:register]
> _GMaster: Working dir path=/var/lib/misc/gluster/
> gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-09-10 11:10:58.545701] I [resource(worker /urd-gds/gluster):1282:service_loop]
> GLUSTER: Register time time=1536577858
> [2018-09-10 11:10:58.564208] I [gsyncdstatus(worker
> /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status
> Change status=Active
> [2018-09-10 11:10:58.565689] I [gsyncdstatus(worker
> /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status
> Change status=History Crawl
> [2018-09-10 11:10:58.565876] I [master(worker
> /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1
> stime=(1527128725, 0) entry_stime=(1527128815, 0) etime=1536577858
> [2018-09-10 11:10:59.593652] I [master(worker
> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
> stime=(1527128725, 0)
> [2018-09-10 11:11:01.755116] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.5233 num_files=103 job=1
> return_code=0
> [2018-09-10 11:11:02.897665] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.6648 num_files=116 job=2
> return_code=0
> [2018-09-10 11:11:03.98150] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.2003 num_files=59 job=2
> return_code=23
> [2018-09-10 11:11:03.219059] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1207 num_files=16 job=2
> return_code=0
> [2018-09-10 11:11:03.841105] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.1212 num_files=32 job=2
> return_code=23
> [2018-09-10 11:11:04.951658] I [master(worker /urd-gds/gluster):1944:syncjob]
> Syncer: Sync Time Taken duration=0.2160 num_files=24 job=3
> return_code=0
> [2018-09-10 11:11:05.2938] E [repce(worker /urd-gds/gluster):197:__call__]
> RepceClient: call failed call=2935:140696531339072:1536577864.67
> method=entry_ops error=GsyncdError
> [2018-09-10 11:11:05.3125] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception]
> <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-09-10 11:11:05.17061] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-10 11:11:05.733716] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
> [2018-09-10 11:11:05.768186] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-09-10 11:11:15.788830] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-09-10 11:11:15.829871] I [gsyncd(agent /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>
>
>
>
> ------------------------------
> *Från:* Kotresh Hiremath Ravishankar <***@redhat.com>
> *Skickat:* den 3 september 2018 07:58
>
> *Till:* Marcus Pedersén
> *Kopia:* gluster-***@gluster.org
> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
> Hi Marcus,
>
> Geo-rep had few important fixes in 4.1.3. Is it possible to upgrade and
> check whether the issue is still seen?
>
> Thanks,
> Kotresh HR
>
> On Sat, Sep 1, 2018 at 5:08 PM, Marcus Pedersén <***@slu.se>
> wrote:
>
>> Hi again,
>>
>> I found another problem on the other master node.
>>
>> The node toggles Active/Faulty and it is the same error over and over
>> again.
>>
>>
>> [2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__]
>> RepceClient: call failed call=1226:139955262510912:1535800981.24
>> method=entry_ops error=GsyncdError
>> [2018-09-01 11:23:02.94214] E [syncdutils(worker
>> /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster"
>> failed with ENOENT (No such file or directory)
>> [2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-09-01 11:23:02.124444] I [gsyncdstatus(monitor):244:set_worker_status]
>> GeorepStatus: Worker Status Change status=Faulty
>>
>>
>> I have also found a python error as well, I have only seen this once
>> though.
>>
>>
>> [2018-09-01 11:16:45.907660] I [master(worker
>> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
>> stime=(1524101534, 0)
>> [2018-09-01 11:16:47.364109] E [syncdutils(worker
>> /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
>> Traceback (most recent call last):
>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>> 362, in twrap
>> tf(*aargs)
>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939,
>> in syncjob
>> po = self.sync_engine(pb, self.log_err)
>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>> 1442, in rsync
>> rconf.ssh_ctl_args + \
>> AttributeError: 'NoneType' object has no attribute 'split'
>> [2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor:
>> worker died in startup phase brick=/urd-gds/gluster
>> [2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status]
>> GeorepStatus: Worker Status Change status=Faulty
>> [2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>>
>>
>> I attach the logs as well.
>>
>>
>> Many thanks!
>>
>>
>> Best regards
>>
>> Marcus Pedersén
>>
>>
>>
>>
>> ------------------------------
>> *Från:* gluster-users-***@gluster.org <gluster-users-***@gluster
>> .org> för Marcus Pedersén <***@slu.se>
>> *Skickat:* den 31 augusti 2018 16:09
>> *Till:* ***@redhat.com
>>
>> *Kopia:* gluster-***@gluster.org
>> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
>> not work Now: Upgraded to 4.1.3 geo node Faulty
>>
>>
>> I realy appologize, third try to make mail smaller.
>>
>>
>> /Marcus
>>
>>
>> ------------------------------
>> *Från:* Marcus Pedersén
>> *Skickat:* den 31 augusti 2018 16:03
>> *Till:* Kotresh Hiremath Ravishankar
>> *Kopia:* gluster-***@gluster.org
>> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
>> not work Now: Upgraded to 4.1.3 geo node Faulty
>>
>>
>> Sorry, resend due to too large mail.
>>
>>
>> /Marcus
>> ------------------------------
>> *Från:* Marcus Pedersén
>> *Skickat:* den 31 augusti 2018 15:19
>> *Till:* Kotresh Hiremath Ravishankar
>> *Kopia:* gluster-***@gluster.org
>> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
>> not work Now: Upgraded to 4.1.3 geo node Faulty
>>
>>
>> Hi Kotresh,
>>
>> Please find attached logs, only logs from today.
>>
>> The python error was repeated over and over again until I disabled
>> selinux.
>>
>> After that the node bacame active again.
>>
>> The return code 23 seems to be repeated over and over again.
>>
>>
>> rsync version 3.1.2
>>
>>
>> Thanks a lot!
>>
>>
>> Best regards
>>
>> Marcus
>>
>>
>> ------------------------------
>> *Från:* Kotresh Hiremath Ravishankar <***@redhat.com>
>> *Skickat:* den 31 augusti 2018 11:09
>> *Till:* Marcus Pedersén
>> *Kopia:* gluster-***@gluster.org
>> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
>> not work Now: Upgraded to 4.1.3 geo node Faulty
>>
>> Hi Marcus,
>>
>> Could you attach full logs? Is the same trace back happening repeatedly?
>> It will be helpful you attach the corresponding mount log as well.
>> What's the rsync version, you are using?
>>
>> Thanks,
>> Kotresh HR
>>
>> On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <***@slu.se
>> > wrote:
>>
>>> Hi all,
>>>
>>> I had problems with stopping sync after upgrade to 4.1.2.
>>>
>>> I upgraded to 4.1.3 and it ran fine for one day, but now one of the
>>> master nodes shows faulty.
>>>
>>> Most of the sync jobs have return code 23, how do I resolve this?
>>>
>>> I see messages like:
>>>
>>> _GMaster: Sucessfully fixed all entry ops with gfid mismatch
>>>
>>> Will this resolve error code 23?
>>>
>>> There is also a python error.
>>>
>>> The python error was a selinux problem, turning off selinux made node go
>>> to active again.
>>>
>>> See log below.
>>>
>>>
>>> CentOS 7, installed through SIG Gluster (OS updated to latest at the
>>> same time)
>>>
>>> Master cluster: 2 x (2 + 1) distributed, replicated
>>>
>>> Client cluster: 1 x (2 + 1) replicated
>>>
>>>
>>> Many thanks in advance!
>>>
>>>
>>> Best regards
>>>
>>> Marcus Pedersén
>>>
>>>
>>>
>>> gsyncd.log from Faulty node:
>>>
>>> [2018-08-31 06:25:51.375267] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.8099 num_files=57 job=3 return_code=23
>>> [2018-08-31 06:25:51.465895] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0904 num_files=3 job=3 return_code=23
>>> [2018-08-31 06:25:52.562107] E [repce(worker
>>> /urd-gds/gluster):197:__call__] RepceClient: call failed
>>> call=30069:139655665837888:1535696752.35 method=entry_ops
>>> error=OSError
>>> [2018-08-31 06:25:52.562346] E [syncdutils(worker
>>> /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
>>> Traceback (most recent call last):
>>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311,
>>> in main
>>> func(args)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72,
>>> in subcmd_worker
>>> local.service_loop(remote)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>>> 1288, in service_loop
>>> g3.crawlwrap(oneshot=True)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615,
>>> in crawlwrap
>>> self.crawl()
>>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545,
>>> in crawl
>>> self.changelogs_batch_process(changes)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445,
>>> in changelogs_batch_process
>>> self.process(batch)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280,
>>> in process
>>> self.process_change(change, done, retry)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179,
>>> in process_change
>>> failures = self.slave.server.entry_ops(entries)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216,
>>> in __call__
>>> return self.ins(self.meth, *a)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198,
>>> in __call__
>>> raise res
>>> OSError: [Errno 13] Permission denied
>>> [2018-08-31 06:25:52.578367] I [repce(agent
>>> /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching
>>> EOF.
>>> [2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor:
>>> worker died in startup phase brick=/urd-gds/gluster
>>> [2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status]
>>> GeorepStatus: Worker Status Change status=Faulty
>>> [2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor:
>>> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>>> [2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main]
>>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-08-31 06:26:03.636810] I [gsyncd(worker
>>> /urd-gds/gluster):297:main] <top>: Using session config file
>>> path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gd
>>> s-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-08-31 06:26:03.637486] I [changelogagent(agent
>>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>>> [2018-08-31 06:26:03.650330] I [resource(worker
>>> /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection
>>> between master and slave...
>>> [2018-08-31 06:26:05.296473] I [resource(worker
>>> /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between
>>> master and slave established. duration=1.6457
>>> [2018-08-31 06:26:05.297904] I [resource(worker
>>> /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume
>>> locally...
>>> [2018-08-31 06:26:06.396939] I [resource(worker
>>> /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume
>>> duration=1.0985
>>> [2018-08-31 06:26:06.397691] I [subcmds(worker
>>> /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
>>> Acknowledging back to monitor
>>> [2018-08-31 06:26:16.815566] I [master(worker
>>> /urd-gds/gluster):1593:register] _GMaster: Working dir
>>> path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo
>>> -001_urd-gds-volume/urd-gds-gluster
>>> [2018-08-31 06:26:16.816423] I [resource(worker
>>> /urd-gds/gluster):1282:service_loop] GLUSTER: Register time
>>> time=1535696776
>>> [2018-08-31 06:26:16.888772] I [gsyncdstatus(worker
>>> /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status
>>> Change status=Active
>>> [2018-08-31 06:26:16.892049] I [gsyncdstatus(worker
>>> /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl
>>> Status Change status=History Crawl
>>> [2018-08-31 06:26:16.892703] I [master(worker
>>> /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1
>>> stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
>>> [2018-08-31 06:26:17.914803] I [master(worker
>>> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
>>> stime=(1525739167, 0)
>>> [2018-08-31 06:26:18.521718] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.1063 num_files=17 job=3 return_code=23
>>> [2018-08-31 06:26:19.260137] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.3441 num_files=34 job=1 return_code=23
>>> [2018-08-31 06:26:19.615191] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0923 num_files=7 job=3 return_code=23
>>> [2018-08-31 06:26:19.891227] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.1302 num_files=12 job=1 return_code=23
>>> [2018-08-31 06:26:19.922700] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.5024 num_files=50 job=2 return_code=23
>>>
>>> [2018-08-31 06:26:21.639342] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=1.5233 num_files=5 job=3 return_code=23
>>> [2018-08-31 06:26:22.12726] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.1191 num_files=7 job=1 return_code=23
>>> [2018-08-31 06:26:22.86136] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0731 num_files=4 job=1 return_code=23
>>> [2018-08-31 06:26:22.503290] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0779 num_files=15 job=2 return_code=23
>>> [2018-08-31 06:26:23.214704] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0738 num_files=9 job=3 return_code=23
>>> [2018-08-31 06:26:23.251876] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.2478 num_files=33 job=2 return_code=23
>>> [2018-08-31 06:26:23.802699] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0873 num_files=9 job=3 return_code=23
>>> [2018-08-31 06:26:23.828176] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0758 num_files=3 job=2 return_code=23
>>> [2018-08-31 06:26:23.854063] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.2662 num_files=34 job=1 return_code=23
>>> [2018-08-31 06:26:24.403228] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0997 num_files=30 job=3 return_code=23
>>> [2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob]
>>> Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3
>>> return_code=23
>>> [2018-08-31 06:26:25.438527] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0832 num_files=9 job=1 return_code=23
>>> [2018-08-31 06:26:25.447256] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.6180 num_files=86 job=2 return_code=23
>>> [2018-08-31 06:26:25.571913] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0706 num_files=2 job=3 return_code=0
>>> [2018-08-31 06:26:27.21325] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0814 num_files=1 job=1 return_code=23
>>> [2018-08-31 06:26:27.615520] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0933 num_files=13 job=1 return_code=23
>>> [2018-08-31 06:26:27.668323] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.2190 num_files=95 job=2 return_code=23
>>> [2018-08-31 06:26:27.740139] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0716 num_files=11 job=2 return_code=23
>>> [2018-08-31 06:26:28.191068] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.1167 num_files=38 job=3 return_code=23
>>> [2018-08-31 06:26:28.268213] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0768 num_files=7 job=3 return_code=23
>>> [2018-08-31 06:26:28.317909] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0770 num_files=4 job=2 return_code=23
>>> [2018-08-31 06:26:28.710064] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0932 num_files=23 job=1 return_code=23
>>> [2018-08-31 06:26:28.907250] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0886 num_files=26 job=2 return_code=23
>>> [2018-08-31 06:26:28.976679] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0692 num_files=4 job=2 return_code=23
>>> [2018-08-31 06:26:29.55774] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0788 num_files=9 job=2 return_code=23
>>> [2018-08-31 06:26:29.295576] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0847 num_files=16 job=1 return_code=23
>>> [2018-08-31 06:26:29.665076] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.1087 num_files=25 job=2 return_code=23
>>> [2018-08-31 06:26:30.277998] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.1122 num_files=40 job=2 return_code=23
>>> [2018-08-31 06:26:31.153105] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.3822 num_files=74 job=3 return_code=23
>>> [2018-08-31 06:26:31.227639] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0743 num_files=18 job=3 return_code=23
>>> [2018-08-31 06:26:31.302660] I [master(worker
>>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>>> duration=0.0748 num_files=18 job=3 return_code=23
>>>
>>>
>>> ---
>>> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
>>> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
>>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>>> E-mailing SLU will result in SLU processing your personal data. For more
>>> information on how this is done, click here
>>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-***@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>> ---
>> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
>> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> ---
>> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
>> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> NÀr du skickar e-post till SLU så innebÀr detta att SLU behandlar dina
> personuppgifter. För att lÀsa mer om hur detta går till, klicka hÀr
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



--
Milind
Marcus Pedersén
2018-09-11 12:02:01 UTC
Permalink
Hi Milind,


I do not know if this will help, but using ausearch on one of the master nodes gives this:


----
time->Tue Sep 11 03:28:56 2018
type=PROCTITLE msg=audit(1536629336.548:1202535): proctitle=2F7573722F7362696E2F676C7573746572667364002D73007572642D6764732D303031002D2D766F6C66696C652D6964007572642D6764732D766F6C756D652E7572642D6764732D3030312E7572642D6764732D676C7573746572002D70002F7661722F72756E2F676C75737465722F766F6C732F7572642D6764732D766F6C
type=SYSCALL msg=audit(1536629336.548:1202535): arch=c000003e syscall=87 success=yes exit=0 a0=7f99acce4d70 a1=7f99acce4ed0 a2=1 a3=6533393436373635 items=0 ppid=1 pid=3582 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="glusteriotwr12" exe="/usr/sbin/glusterfsd" subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(1536629336.548:1202535): avc: denied { unlink } for pid=3582 comm="glusteriotwr12" name="67b1d169-14d0-413c-8f76-5676493ea7b8" dev="dm-7" ino=36507351099 scontext=system_u:system_r:glusterd_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=sock_file
----
time->Tue Sep 11 04:49:29 2018
type=PROCTITLE msg=audit(1536634169.350:1207973): proctitle=2F7573722F7362696E2F676C7573746572667364002D73007572642D6764732D303031002D2D766F6C66696C652D6964007572642D6764732D766F6C756D652E7572642D6764732D3030312E7572642D6764732D676C7573746572002D70002F7661722F72756E2F676C75737465722F766F6C732F7572642D6764732D766F6C
type=SYSCALL msg=audit(1536634169.350:1207973): arch=c000003e syscall=280 success=yes exit=0 a0=ffffffffffffff9c a1=7f99ad6f5400 a2=7f99ad6f54a0 a3=100 items=0 ppid=1 pid=3582 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="glusteriotwr12" exe="/usr/sbin/glusterfsd" subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(1536634169.350:1207973): avc: denied { setattr } for pid=3582 comm="glusteriotwr12" name="fe949f56-bc6e-411a-9a40-ad6209fbbd89" dev="dm-7" ino=51546135597 scontext=system_u:system_r:glusterd_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=sock_file



Best Regards

Marcus Pedersén



________________________________
Från: Milind Changire <***@redhat.com>
Skickat: den 11 september 2018 08:56
Till: Marcus Pedersén
Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Marcus,
Is it possible to send over the SELinux errors that you encountered before turning it off ?
We could inspect and get the SELinux issues fixed as an aside.


On Mon, Sep 10, 2018 at 4:43 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi Kotresh,

I have been running 4.1.3 from the end of August.

Since then data has been synced to geo side with a couple of hundred GB per 24 hour, even with the errors I have reported in this thread.


Four days ago all data transfer to geo side stopped, and the logs repeats the same error over and over again (see below).

Both nodes toggle status Active/Faulty.


Thanks alot!


Best regards

Marcus


One master node, gsyncd.log:

[2018-09-10 10:53:38.409709] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 10:53:47.783914] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:47.852792] I [gsyncd(status):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.421061] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-09-10 10:53:48.462655] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.463366] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-09-10 10:53:48.465905] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.474558] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-10 10:53:50.70219] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.5954
[2018-09-10 10:53:50.70777] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-10 10:53:51.170597] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0994
[2018-09-10 10:53:51.171158] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-10 10:53:51.696057] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:51.764605] I [gsyncd(status):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:53.210553] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-09-10 10:53:53.211148] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1536576833
[2018-09-10 10:53:53.230945] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-10 10:53:53.233444] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-10 10:53:53.233632] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1524272046, 0) entry_stime=(1524271940, 0) etime=1536576833
[2018-09-10 10:53:53.234951] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1524272046, 0)
[2018-09-10 10:53:53.762105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0856 num_files=1 job=1 return_code=0
[2018-09-10 10:53:54.437858] I [master(worker /urd-gds/gluster):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-09-10 10:53:54.437973] I [master(worker /urd-gds/gluster):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1.1979 DATA=1 XATT=0
[2018-09-10 10:53:54.438153] I [master(worker /urd-gds/gluster):1394:process] _GMaster: Batch Completed changelog_end=1524272047 entry_stime=(1524271940, 0) changelog_start=1524272047 stime=(152\
4272046, 0) duration=1.2029 num_changelogs=1 mode=history_changelog
[2018-09-10 10:53:54.482408] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1524272046, 0)
[2018-09-10 10:53:54.583467] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=1844:139973681694528:1536576834.54 method=entry_ops error=GsyncdError
[2018-09-10 10:53:54.583585] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-09-10 10:53:54.600353] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-10 10:53:55.175978] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-09-10 10:53:55.182988] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 10:53:56.24414] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf



Other master node, gsyncd.log:

[2018-09-10 11:10:43.10458] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 11:10:53.28702] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-09-10 11:10:53.69638] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 11:10:53.70264] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-09-10 11:10:53.71902] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 11:10:53.80737] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-10 11:10:54.621948] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.5410
[2018-09-10 11:10:54.622504] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-10 11:10:55.721349] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0984
[2018-09-10 11:10:55.721913] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-10 11:10:58.543606] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-09-10 11:10:58.545701] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1536577858
[2018-09-10 11:10:58.564208] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-10 11:10:58.565689] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-10 11:10:58.565876] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1527128725, 0) entry_stime=(1527128815, 0) etime=1536577858
[2018-09-10 11:10:59.593652] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1527128725, 0)
[2018-09-10 11:11:01.755116] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.5233 num_files=103 job=1 return_code=0
[2018-09-10 11:11:02.897665] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.6648 num_files=116 job=2 return_code=0
[2018-09-10 11:11:03.98150] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2003 num_files=59 job=2 return_code=23
[2018-09-10 11:11:03.219059] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1207 num_files=16 job=2 return_code=0
[2018-09-10 11:11:03.841105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1212 num_files=32 job=2 return_code=23
[2018-09-10 11:11:04.951658] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2160 num_files=24 job=3 return_code=0
[2018-09-10 11:11:05.2938] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=2935:140696531339072:1536577864.67 method=entry_ops error=GsyncdError
[2018-09-10 11:11:05.3125] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-09-10 11:11:05.17061] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-10 11:11:05.733716] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-09-10 11:11:05.768186] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 11:11:15.788830] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-09-10 11:11:15.829871] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf




________________________________
Från: Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>
Skickat: den 3 september 2018 07:58

Till: Marcus Pedersén
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Geo-rep had few important fixes in 4.1.3. Is it possible to upgrade and check whether the issue is still seen?

Thanks,
Kotresh HR

On Sat, Sep 1, 2018 at 5:08 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi again,

I found another problem on the other master node.

The node toggles Active/Faulty and it is the same error over and over again.


[2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=1226:139955262510912:1535800981.24 method=entry_ops error=GsyncdError
[2018-09-01 11:23:02.94214] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-01 11:23:02.124444] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty


I have also found a python error as well, I have only seen this once though.


[2018-09-01 11:16:45.907660] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1524101534, 0)
[2018-09-01 11:16:47.364109] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 362, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939, in syncjob
po = self.sync_engine(pb, self.log_err)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1442, in rsync
rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000


I attach the logs as well.


Many thanks!


Best regards

Marcus Pedersén




________________________________
Från: gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org> <gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org>> för Marcus Pedersén <***@slu.se<mailto:***@slu.se>>
Skickat: den 31 augusti 2018 16:09
Till: ***@redhat.com<mailto:***@redhat.com>

Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


I realy appologize, third try to make mail smaller.


/Marcus


________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 16:03
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Sorry, resend due to too large mail.


/Marcus

________________________________
Från: Marcus Pedersén
Skickat: den 31 augusti 2018 15:19
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty


Hi Kotresh,

Please find attached logs, only logs from today.

The python error was repeated over and over again until I disabled selinux.

After that the node bacame active again.

The return code 23 seems to be repeated over and over again.


rsync version 3.1.2


Thanks a lot!


Best regards

Marcus


________________________________
Från: Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>
Skickat: den 31 augusti 2018 11:09
Till: Marcus Pedersén
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Could you attach full logs? Is the same trace back happening repeatedly? It will be helpful you attach the corresponding mount log as well.
What's the rsync version, you are using?

Thanks,
Kotresh HR

On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I had problems with stopping sync after upgrade to 4.1.2.

I upgraded to 4.1.3 and it ran fine for one day, but now one of the master nodes shows faulty.

Most of the sync jobs have return code 23, how do I resolve this?

I see messages like:

_GMaster: Sucessfully fixed all entry ops with gfid mismatch

Will this resolve error code 23?

There is also a python error.

The python error was a selinux problem, turning off selinux made node go to active again.

See log below.


CentOS 7, installed through SIG Gluster (OS updated to latest at the same time)

Master cluster: 2 x (2 + 1) distributed, replicated

Client cluster: 1 x (2 + 1) replicated


Many thanks in advance!


Best regards

Marcus Pedersén



gsyncd.log from Faulty node:

[2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.8099 num_files=57 job=3 return_code=23
[2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0904 num_files=3 job=3 return_code=23
[2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=30069:139655665837888:1535696752.35 method=entry_ops error=OSError
[2018-08-31 06:25:52.562346] E [syncdutils(worker /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied
[2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.637486] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-31 06:26:03.650330] I [resource(worker /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-31 06:26:05.296473] I [resource(worker /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and slave established. duration=1.6457
[2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume duration=1.0985
[2018-08-31 06:26:06.397691] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-31 06:26:16.816423] I [resource(worker /urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1535696776
[2018-08-31 06:26:16.888772] I [gsyncdstatus(worker /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-31 06:26:16.892049] I [gsyncdstatus(worker /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-31 06:26:16.892703] I [master(worker /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1525739167, 0) entry_stime=(1525740143, 0) etime=1535696776
[2018-08-31 06:26:17.914803] I [master(worker /urd-gds/gluster):1536:crawl] _GMaster: slave's time stime=(1525739167, 0)
[2018-08-31 06:26:18.521718] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1063 num_files=17 job=3 return_code=23
[2018-08-31 06:26:19.260137] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3441 num_files=34 job=1 return_code=23
[2018-08-31 06:26:19.615191] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0923 num_files=7 job=3 return_code=23
[2018-08-31 06:26:19.891227] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1302 num_files=12 job=1 return_code=23
[2018-08-31 06:26:19.922700] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.5024 num_files=50 job=2 return_code=23

[2018-08-31 06:26:21.639342] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=1.5233 num_files=5 job=3 return_code=23
[2018-08-31 06:26:22.12726] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1191 num_files=7 job=1 return_code=23
[2018-08-31 06:26:22.86136] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0731 num_files=4 job=1 return_code=23
[2018-08-31 06:26:22.503290] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0779 num_files=15 job=2 return_code=23
[2018-08-31 06:26:23.214704] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0738 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.251876] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2478 num_files=33 job=2 return_code=23
[2018-08-31 06:26:23.802699] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0873 num_files=9 job=3 return_code=23
[2018-08-31 06:26:23.828176] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0758 num_files=3 job=2 return_code=23
[2018-08-31 06:26:23.854063] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2662 num_files=34 job=1 return_code=23
[2018-08-31 06:26:24.403228] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0997 num_files=30 job=3 return_code=23
[2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0965 num_files=8 job=3 return_code=23
[2018-08-31 06:26:25.438527] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0832 num_files=9 job=1 return_code=23
[2018-08-31 06:26:25.447256] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.6180 num_files=86 job=2 return_code=23
[2018-08-31 06:26:25.571913] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0706 num_files=2 job=3 return_code=0
[2018-08-31 06:26:27.21325] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0814 num_files=1 job=1 return_code=23
[2018-08-31 06:26:27.615520] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0933 num_files=13 job=1 return_code=23
[2018-08-31 06:26:27.668323] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.2190 num_files=95 job=2 return_code=23
[2018-08-31 06:26:27.740139] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0716 num_files=11 job=2 return_code=23
[2018-08-31 06:26:28.191068] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1167 num_files=38 job=3 return_code=23
[2018-08-31 06:26:28.268213] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0768 num_files=7 job=3 return_code=23
[2018-08-31 06:26:28.317909] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0770 num_files=4 job=2 return_code=23
[2018-08-31 06:26:28.710064] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0932 num_files=23 job=1 return_code=23
[2018-08-31 06:26:28.907250] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0886 num_files=26 job=2 return_code=23
[2018-08-31 06:26:28.976679] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0692 num_files=4 job=2 return_code=23
[2018-08-31 06:26:29.55774] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0788 num_files=9 job=2 return_code=23
[2018-08-31 06:26:29.295576] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=16 job=1 return_code=23
[2018-08-31 06:26:29.665076] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1087 num_files=25 job=2 return_code=23
[2018-08-31 06:26:30.277998] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.1122 num_files=40 job=2 return_code=23
[2018-08-31 06:26:31.153105] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.3822 num_files=74 job=3 return_code=23
[2018-08-31 06:26:31.227639] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0743 num_files=18 job=3 return_code=23
[2018-08-31 06:26:31.302660] I [master(worker /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken duration=0.0748 num_files=18 job=3 return_code=23


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>



--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Milind


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
Loading...