[Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

Discussion:

Kotte, Christian (Ext)

2018-09-21 14:40:21 UTC

Hi,

Any idea how to troubleshoot this?

New folders and files were created on the master and the replication went faulty. They were created via Samba.

Version: GlusterFS 4.1.3

[***@master]# gluster volume geo-replication status

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_1::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_2::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@interimmaster::glustervol1 N/A Faulty N/A N/A

The following error is repeatedly logged in the gsyncd.logs:
[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=nrchbs-slp2020.nibr.novartis.net
[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1 from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 XATT=0
[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141, 0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but Iâm not sure if the syncdaemon runs under geoaccount!?

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
[***@master vRealize Operation Manager]#

It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldnât investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this errorâŠ

Any help is appreciated.

Regards,

Christian Kotte

Kotresh Hiremath Ravishankar

2018-09-22 04:52:37 UTC

Permalink

The problem occured on slave side whose error is propagated to master.
Mostly any traceback with repce involved is related to problem in slave.
Just check few lines above in the log to find the slave node, the crashed
worker is connected to and get geo replication logs to further debug.

On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <

Post by Kotte, Christian (Ext)
Hi,
Any idea how to troubleshoot this?
New folders and files were created on the master and the replication went
faulty. They were created via Samba.
Version: GlusterFS 4.1.3
MASTER NODE MASTER VOL MASTER BRICK
SLAVE USER
SLAVE SLAVE
NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick geoaccount
N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount
N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount
N/A N/A
[2018-09-21 14:26:38.611479] I [repce(agent
/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
EOF.
worker died in startup phase brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I
[gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
Change status=Faulty
starting gsyncd worker brick=/bricks/brick1/brick slave_node=
nrchbs-slp2020.nibr.novartis.net
[2018-09-21 14:26:49.471532] I [gsyncd(agent
/bricks/brick1/brick):297:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent
/bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker
/bricks/brick1/brick):297:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker
/bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker
/bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
master and slave established. duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker
/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
locally...
[2018-09-21 14:26:54.120374] I [resource(worker
/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker
/bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker
/bricks/brick1/brick):1593:register] _GMaster: Working dir
path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker
/bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker
/bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change
status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker
/bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
Status Change status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker
/bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0)
etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker
/bricks/brick1/brick):1536:crawl] _GMaster: slave's time
stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker
/bricks/brick1/brick):1360:process] _GMaster: Skipping already processed
entry ops to_changelog=1537522638 num_changelogs=1
from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker
/bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0
MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker
/bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0
XATT=0
[2018-09-21 14:26:56.198230] I [master(worker
/bricks/brick1/brick):1394:process] _GMaster: Batch Completed
changelog_end=1537522638 entry_stime=(1537537141, 0)
changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333
num_changelogs=1 mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker
/bricks/brick1/brick):1536:crawl] _GMaster: slave's time
stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker
/bricks/brick1/brick):197:__call__] RepceClient: call failed
call=17209:140650361157440:1537540017.21 method=entry_ops
error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in
changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
'/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
The permissions look fine. The replication is done via geo user instead of
root. It should be able to read, but Iâm not sure if the syncdaemon runs
under geoaccount!?
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
-> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation
Manager
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
It could be possible that the folder was renamed. I had 3 similar issues
since I migrated to GlusterFS 4.x but couldnât investigate much. I needed
to completely wipe GlusterFS and geo-repliction to get rid of this errorâŠ
Any help is appreciated.
Regards,
*Christian Kotte*
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users

Kotte, Christian (Ext)

2018-09-24 14:04:28 UTC

Permalink

Ok. It happens on all slave nodes (and on the interimmaster as well).

Itâs like I assumed. These are the logs of one of the slaves:

gsyncd.log
[2018-09-24 13:52:25.418382] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:37.95297] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:37.109643] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:38.303920] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1941
[2018-09-24 13:52:38.304771] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-24 13:52:41.981554] I [resource(slave slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
[2018-09-24 13:52:42.45641] E [repce(slave slave/bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops
src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in get_slv_dir_path
[ENOENT], [ESTALE])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap
return call(*arg)
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
[2018-09-24 13:52:42.81794] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:53.459676] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:53.473500] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:54.659044] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1854
[2018-09-24 13:52:54.659837] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

The folder âNew folderâ will be created via Samba and it was renamed by my colleague right away after creation.
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/ -al
total 0
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 .
drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

Creating the folder in /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, but it didnât change anything.

mnt-slave-bricks-brick1-brick.log
[2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0)
[2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
[2018-09-24 13:51:10.643507] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to set process-name in handshake msg
[2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.671460] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/bricks/brick1/brick'.
[2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-24 13:51:10.673329] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
[2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received signum (15), shutting down
[2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
[2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com>
Date: Saturday, 22. September 2018 at 06:52
To: "Kotte, Christian (Ext)" <***@novartis.com>
Cc: Gluster Users <gluster-***@gluster.org>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug.

On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <***@novartis.com<mailto:***@novartis.com>> wrote:
Hi,

Any idea how to troubleshoot this?

New folders and files were created on the master and the replication went faulty. They were created via Samba.

Version: GlusterFS 4.1.3

[***@master]# gluster volume geo-replication status

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_1::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_2::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@interimmaster::glustervol1 N/A Faulty N/A N/A

The following error is repeatedly logged in the gsyncd.logs:
[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=nrchbs-slp2020.nibr.novartis.net<http://nrchbs-slp2020.nibr.novartis.net>
[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1 from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 XATT=0
[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141, 0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but Iâm not sure if the syncdaemon runs under geoaccount!?

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
[***@master vRealize Operation Manager]#

It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldnât investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this errorâŠ

Any help is appreciated.

Regards,

Christian Kotte
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>

Kotresh Hiremath Ravishankar

2018-09-24 14:20:07 UTC

Permalink

I think I am get what's happening. The geo-rep session is non-root.
Could you do readlink on brick path mentioned above
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
from a geaccount user and see if you are getting "Permission Denied" errors?

Thanks,
Kotresh HR

On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <

Post by Kotte, Christian (Ext)
Ok. It happens on all slave nodes (and on the interimmaster as well).
gsyncd.log
[2018-09-24 13:52:25.418382] I [repce(slave
slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
reaching EOF.
[2018-09-24 13:52:37.95297] W [gsyncd(slave
slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
using the default config
path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:37.109643] I [resource(slave
slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
locally...
[2018-09-24 13:52:38.303920] I [resource(slave
slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
volume duration=1.1941
[2018-09-24 13:52:38.304771] I [resource(slave
slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-24 13:52:41.981554] I [resource(slave
slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on
mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e
entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
[2018-09-24 13:52:42.45641] E [repce(slave
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops
src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682,
in get_slv_dir_path
[ENOENT], [ESTALE])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap
return call(*arg)
'/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
[2018-09-24 13:52:42.81794] I [repce(slave
slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
reaching EOF.
[2018-09-24 13:52:53.459676] W [gsyncd(slave
slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
using the default config
path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:53.473500] I [resource(slave
slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
locally...
[2018-09-24 13:52:54.659044] I [resource(slave
slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
duration=1.1854
[2018-09-24 13:52:54.659837] I [resource(slave
slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
The folder âNew folderâ will be created via Samba and it was renamed by my
colleague right away after creation.
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
/bricks/brick1/brick/.glusterfs/29/d1/ -al
total 0
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 .
drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39
29d1d60d-1ad6-45fc-87e0-93d478f7331e ->
../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager
Creating the folder in
/bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/,
but it didnât change anything.
mnt-slave-bricks-brick1-brick.log
[2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit]
0-glustervol1-client-0: error returned while attempting to connect to
host:(null), port:0
[2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit]
0-glustervol1-client-0: error returned while attempting to connect to
host:(null), port:0
[2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig]
0-glustervol1-client-0: changing port to 49152 (from 0)
[2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit]
0-glustervol1-client-0: error returned while attempting to connect to
host:(null), port:0
[2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data]
(-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a)
[0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16)
[0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91)
[0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
[2018-09-24 13:51:10.643507] I [MSGID: 114006]
[client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed
to set process-name in handshake msg
[2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit]
0-glustervol1-client-0: error returned while attempting to connect to
host:(null), port:0
[2018-09-24 13:51:10.671460] I [MSGID: 114046]
Connected to glustervol1-client-0, attached to remote volume
'/bricks/brick1/brick'.
[2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
7.22
[2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync]
0-fuse: switched to graph 0
[2018-09-24 13:51:10.673329] I [MSGID: 109005]
[dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory
selfheal failed: Unable to form layout for directory /
[2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc]
0-fuse: initating unmount of
/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
[2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65]
received signum (15), shutting down
Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
[2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing
fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
Regards,
Christian
*Date: *Saturday, 22. September 2018 at 06:52
*Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
OSError: [Errno 13] Permission denied
The problem occured on slave side whose error is propagated to master.
Mostly any traceback with repce involved is related to problem in slave.
Just check few lines above in the log to find the slave node, the crashed
worker is connected to and get geo replication logs to further debug.
On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <
Hi,
Any idea how to troubleshoot this?
New folders and files were created on the master and the replication went
faulty. They were created via Samba.
Version: GlusterFS 4.1.3
MASTER NODE MASTER VOL MASTER BRICK
SLAVE USER
SLAVE SLAVE
NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick geoaccount
N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount
N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount
N/A N/A
[2018-09-21 14:26:38.611479] I [repce(agent
/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
EOF.
worker died in startup phase brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I
[gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
Change status=Faulty
starting gsyncd worker brick=/bricks/brick1/brick slave_node=
nrchbs-slp2020.nibr.novartis.net
[2018-09-21 14:26:49.471532] I [gsyncd(agent
/bricks/brick1/brick):297:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent
/bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker
/bricks/brick1/brick):297:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker
/bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker
/bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
master and slave established. duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker
/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
locally...
[2018-09-21 14:26:54.120374] I [resource(worker
/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker
/bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker
/bricks/brick1/brick):1593:register] _GMaster: Working dir
path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker
/bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker
/bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change
status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker
/bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
Status Change status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker
/bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0)
etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker
/bricks/brick1/brick):1536:crawl] _GMaster: slave's time
stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker
/bricks/brick1/brick):1360:process] _GMaster: Skipping already processed
entry ops to_changelog=1537522638 num_changelogs=1
from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker
/bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0
MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker
/bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0
XATT=0
[2018-09-21 14:26:56.198230] I [master(worker
/bricks/brick1/brick):1394:process] _GMaster: Batch Completed
changelog_end=1537522638 entry_stime=(1537537141, 0)
changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333
num_changelogs=1 mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker
/bricks/brick1/brick):1536:crawl] _GMaster: slave's time
stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker
/bricks/brick1/brick):197:__call__] RepceClient: call failed
call=17209:140650361157440:1537540017.21 method=entry_ops
error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in
changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
'/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
The permissions look fine. The replication is done via geo user instead of
root. It should be able to read, but Iâm not sure if the syncdaemon runs
under geoaccount!?
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
-> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation
Manager
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
It could be possible that the folder was renamed. I had 3 similar issues
since I migrated to GlusterFS 4.x but couldnât investigate much. I needed
to completely wipe GlusterFS and geo-repliction to get rid of this errorâŠ
Any help is appreciated.
Regards,
*Christian Kotte*
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>

--
Thanks and Regards,
Kotresh H R

Kotte, Christian (Ext)

2018-09-24 15:10:01 UTC

Permalink

Yeah right. I get permission denied.

[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/
ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission denied

[***@slave ~]# ll /bricks/brick1/brick/.glusterfs/29
total 0
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb
[***@slave ~]#

However, the strange thing is that I could replicate new files and folders before. The replication is broken since the âNew folderâ was created.

These are the permissions on a dev/test system:
[***@slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/
total 3136
drwx------. 44 root root 4096 Aug 22 18:19 00
drwx------. 50 root root 4096 Sep 12 13:14 01
drwx------. 54 root root 4096 Sep 13 11:33 02
drwx------. 59 root root 4096 Aug 22 18:21 03
drwx------. 60 root root 4096 Sep 12 13:14 04
drwx------. 68 root root 4096 Aug 24 12:36 05
drwx------. 56 root root 4096 Aug 22 18:21 06
drwx------. 46 root root 4096 Aug 22 18:21 07
drwx------. 51 root root 4096 Aug 22 18:21 08
drwx------. 42 root root 4096 Aug 22 18:21 09
drwx------. 44 root root 4096 Sep 13 11:16 0a

Iâve configured an AD group, SGID bit, and ACLs via Ansible on the local mount point. Could this be an issue? Should I avoid configuring the permissions on .glusterfs and below?

# ll /mnt/glustervol1/
total 12
drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts
drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software
-rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test
drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp

glusterfs_volumes:
[âŠ]
permissions:
mode: "02775"
owner: root
group: "AD+group"
acl_permissions: rw
[âŠ]

# root directory is owned by root.
# set permissions to 'g+s' to automatically set the group to "AD+group"
# permissions of individual files will be set by Samba during creation
- name: Configure volume directory permission 1/2
tags: glusterfs
file:
path: /mnt/{{ item.volume }}
state: directory
mode: "{{ item.permissions.mode }}"
owner: "{{ item.permissions.owner }}"
group: "{{ item.permissions.group }}"
with_items: "{{ glusterfs_volumes }}"
loop_control:
label: "{{ item.volume }}"
when: item.permissions is defined

# ACL needs to be set to override default umask and grant "AD+group" write permissions
- name: Configure volume directory permission 2/2 (ACL)
tags: glusterfs
acl:
path: /mnt/{{ item.volume }}
default: yes
entity: "{{ item.permissions.group }}"
etype: group
permissions: "{{ item.permissions.acl_permissions }}"
state: present
with_items: "{{ glusterfs_volumes }}"
loop_control:
label: "{{ item.volume }}"
when: item.permissions is defined

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com>
Date: Monday, 24. September 2018 at 16:20
To: "Kotte, Christian (Ext)" <***@novartis.com>
Cc: Gluster Users <gluster-***@gluster.org>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

I think I am get what's happening. The geo-rep session is non-root.
Could you do readlink on brick path mentioned above /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
from a geaccount user and see if you are getting "Permission Denied" errors?

Thanks,
Kotresh HR

On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <***@novartis.com<mailto:***@novartis.com>> wrote:
Ok. It happens on all slave nodes (and on the interimmaster as well).

Itâs like I assumed. These are the logs of one of the slaves:

gsyncd.log
[2018-09-24 13:52:25.418382] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:37.95297] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:37.109643] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:38.303920] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1941
[2018-09-24 13:52:38.304771] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-24 13:52:41.981554] I [resource(slave slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
[2018-09-24 13:52:42.45641] E [repce(slave slave/bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops
src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in get_slv_dir_path
[ENOENT], [ESTALE])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap
return call(*arg)
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
[2018-09-24 13:52:42.81794] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:53.459676] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:53.473500] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:54.659044] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1854
[2018-09-24 13:52:54.659837] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

The folder âNew folderâ will be created via Samba and it was renamed by my colleague right away after creation.
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/ -al
total 0
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 .
drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

Creating the folder in /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, but it didnât change anything.

mnt-slave-bricks-brick1-brick.log
[2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0)
[2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
[2018-09-24 13:51:10.643507] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to set process-name in handshake msg
[2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.671460] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/bricks/brick1/brick'.
[2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-24 13:51:10.673329] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
[2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received signum (15), shutting down
[2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
[2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>
Date: Saturday, 22. September 2018 at 06:52
To: "Kotte, Christian (Ext)" <***@novartis.com<mailto:***@novartis.com>>
Cc: Gluster Users <gluster-***@gluster.org<mailto:gluster-***@gluster.org>>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug.

On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <***@novartis.com<mailto:***@novartis.com>> wrote:
Hi,

Any idea how to troubleshoot this?

New folders and files were created on the master and the replication went faulty. They were created via Samba.

Version: GlusterFS 4.1.3

[***@master]# gluster volume geo-replication status

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_1::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_2::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@interimmaster::glustervol1 N/A Faulty N/A N/A

The following error is repeatedly logged in the gsyncd.logs:
[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=nrchbs-slp2020.nibr.novartis.net<http://nrchbs-slp2020.nibr.novartis.net>
[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1 from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 XATT=0
[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141, 0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but Iâm not sure if the syncdaemon runs under geoaccount!?

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
[***@master vRealize Operation Manager]#

It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldnât investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this errorâŠ

Any help is appreciated.

Regards,

Christian Kotte
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>

--
Thanks and Regards,
Kotresh H R

Kotte, Christian (Ext)

2018-09-24 15:19:57 UTC

Permalink

I donât configure the permissions of /bricks/brick1/brick/.glusterfs. I donât even see it on the local GlusterFS mount.

Not sure why the permissions are configured with S and the AD groupâŠ

Regards,
Christian

From: <gluster-users-***@gluster.org> on behalf of "Kotte, Christian (Ext)" <***@novartis.com>
Date: Monday, 24. September 2018 at 17:10
To: Kotresh Hiremath Ravishankar <***@redhat.com>
Cc: Gluster Users <gluster-***@gluster.org>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

Yeah right. I get permission denied.

[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/
ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission denied

[***@slave ~]# ll /bricks/brick1/brick/.glusterfs/29
total 0
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb
[***@slave ~]#

However, the strange thing is that I could replicate new files and folders before. The replication is broken since the âNew folderâ was created.

These are the permissions on a dev/test system:
[***@slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/
total 3136
drwx------. 44 root root 4096 Aug 22 18:19 00
drwx------. 50 root root 4096 Sep 12 13:14 01
drwx------. 54 root root 4096 Sep 13 11:33 02
drwx------. 59 root root 4096 Aug 22 18:21 03
drwx------. 60 root root 4096 Sep 12 13:14 04
drwx------. 68 root root 4096 Aug 24 12:36 05
drwx------. 56 root root 4096 Aug 22 18:21 06
drwx------. 46 root root 4096 Aug 22 18:21 07
drwx------. 51 root root 4096 Aug 22 18:21 08
drwx------. 42 root root 4096 Aug 22 18:21 09
drwx------. 44 root root 4096 Sep 13 11:16 0a

Iâve configured an AD group, SGID bit, and ACLs via Ansible on the local mount point. Could this be an issue? Should I avoid configuring the permissions on .glusterfs and below?

# ll /mnt/glustervol1/
total 12
drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts
drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software
-rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test
drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp

glusterfs_volumes:
[âŠ]
permissions:
mode: "02775"
owner: root
group: "AD+group"
acl_permissions: rw
[âŠ]

# root directory is owned by root.
# set permissions to 'g+s' to automatically set the group to "AD+group"
# permissions of individual files will be set by Samba during creation
- name: Configure volume directory permission 1/2
tags: glusterfs
file:
path: /mnt/{{ item.volume }}
state: directory
mode: "{{ item.permissions.mode }}"
owner: "{{ item.permissions.owner }}"
group: "{{ item.permissions.group }}"
with_items: "{{ glusterfs_volumes }}"
loop_control:
label: "{{ item.volume }}"
when: item.permissions is defined

# ACL needs to be set to override default umask and grant "AD+group" write permissions
- name: Configure volume directory permission 2/2 (ACL)
tags: glusterfs
acl:
path: /mnt/{{ item.volume }}
default: yes
entity: "{{ item.permissions.group }}"
etype: group
permissions: "{{ item.permissions.acl_permissions }}"
state: present
with_items: "{{ glusterfs_volumes }}"
loop_control:
label: "{{ item.volume }}"
when: item.permissions is defined

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com>
Date: Monday, 24. September 2018 at 16:20
To: "Kotte, Christian (Ext)" <***@novartis.com>
Cc: Gluster Users <gluster-***@gluster.org>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

I think I am get what's happening. The geo-rep session is non-root.
Could you do readlink on brick path mentioned above /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
from a geaccount user and see if you are getting "Permission Denied" errors?

Thanks,
Kotresh HR

On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <***@novartis.com<mailto:***@novartis.com>> wrote:
Ok. It happens on all slave nodes (and on the interimmaster as well).

Itâs like I assumed. These are the logs of one of the slaves:

gsyncd.log
[2018-09-24 13:52:25.418382] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:37.95297] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:37.109643] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:38.303920] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1941
[2018-09-24 13:52:38.304771] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-24 13:52:41.981554] I [resource(slave slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
[2018-09-24 13:52:42.45641] E [repce(slave slave/bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops
src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in get_slv_dir_path
[ENOENT], [ESTALE])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap
return call(*arg)
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
[2018-09-24 13:52:42.81794] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:53.459676] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:53.473500] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:54.659044] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1854
[2018-09-24 13:52:54.659837] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

The folder âNew folderâ will be created via Samba and it was renamed by my colleague right away after creation.
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/ -al
total 0
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 .
drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

Creating the folder in /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, but it didnât change anything.

mnt-slave-bricks-brick1-brick.log
[2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0)
[2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
[2018-09-24 13:51:10.643507] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to set process-name in handshake msg
[2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.671460] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/bricks/brick1/brick'.
[2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-24 13:51:10.673329] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
[2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received signum (15), shutting down
[2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
[2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>
Date: Saturday, 22. September 2018 at 06:52
To: "Kotte, Christian (Ext)" <***@novartis.com<mailto:***@novartis.com>>
Cc: Gluster Users <gluster-***@gluster.org<mailto:gluster-***@gluster.org>>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug.

On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <***@novartis.com<mailto:***@novartis.com>> wrote:
Hi,

Any idea how to troubleshoot this?

New folders and files were created on the master and the replication went faulty. They were created via Samba.

Version: GlusterFS 4.1.3

[***@master]# gluster volume geo-replication status

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_1::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_2::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@interimmaster::glustervol1 N/A Faulty N/A N/A

The following error is repeatedly logged in the gsyncd.logs:
[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=nrchbs-slp2020.nibr.novartis.net<http://nrchbs-slp2020.nibr.novartis.net>
[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1 from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 XATT=0
[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141, 0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but Iâm not sure if the syncdaemon runs under geoaccount!?

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
[***@master vRealize Operation Manager]#

It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldnât investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this errorâŠ

Any help is appreciated.

Regards,

Christian Kotte
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>

--
Thanks and Regards,
Kotresh H R

Kotte, Christian (Ext)

2018-09-25 14:39:54 UTC

Permalink

I changed to replication to use the root user and re-created the replication with âcreate forceâ. Now the files and folders were replicated, and the permission denied, and New folder error disappeared, but old files are not deleted.

Looks like the history crawl is in some kind of a loop:

[***@master ~]# gluster volume geo-replication status

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick root ssh://slave_1::glustervol1 slave_1 Active Hybrid Crawl N/A
master glustervol1 /bricks/brick1/brick root ssh://slave_2::glustervol1 slave_2 Active Hybrid Crawl N/A
master glustervol1 /bricks/brick1/brick root ssh://slave_3::glustervol1 slave_3 Active Hybrid Crawl N/A

tail -f /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 104, in cl_history_changelog
raise ChangelogHistoryNotAvailable()
ChangelogHistoryNotAvailable
[2018-09-25 14:10:44.196011] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=29945:139700517484352:1537884644.19 method=history error=ChangelogHistoryNotAvailable
[2018-09-25 14:10:44.196405] I [resource(worker /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not available, using xsync
[2018-09-25 14:10:44.221385] I [master(worker /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl stime=(0, 0)
[2018-09-25 14:10:44.223382] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=Hybrid Crawl
[2018-09-25 14:10:46.225296] I [master(worker /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644
[2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-25 14:31:10.145035] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1212.5316 num_files=1474 job=2 return_code=11
[2018-09-25 14:31:10.152637] E [syncdutils(worker /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock slave_3:/proc/15919/cwd error=11
[2018-09-25 14:31:10.237371] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-25 14:31:10.430820] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=slave_3
[2018-09-25 14:31:20.806518] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-25 14:31:20.816536] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-25 14:31:20.821574] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-25 14:31:20.882128] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-25 14:31:24.169857] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.2873
[2018-09-25 14:31:24.170401] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-25 14:31:25.354633] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1839
[2018-09-25 14:31:25.355073] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-25 14:31:27.439034] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick
[2018-09-25 14:31:27.441847] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537885887
[2018-09-25 14:31:27.465053] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-25 14:31:27.471021] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-25 14:31:27.471484] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(0, 0) entry_stime=None etime=1537885887
[2018-09-25 14:31:27.472564] E [repce(agent /bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history
num_parallel)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 104, in cl_history_changelog
raise ChangelogHistoryNotAvailable()
ChangelogHistoryNotAvailable
[2018-09-25 14:31:27.480632] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=31250:140272364406592:1537885887.47 method=history error=ChangelogHistoryNotAvailable
[2018-09-25 14:31:27.480958] I [resource(worker /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not available, using xsync
[2018-09-25 14:31:27.495117] I [master(worker /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl stime=(0, 0)
[2018-09-25 14:31:27.502083] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=Hybrid Crawl
[2018-09-25 14:31:29.505284] I [master(worker /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887

tail -f /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log
[2018-09-25 13:49:24.141303] I [repce(slave master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-25 13:49:36.602051] W [gsyncd(slave master/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-25 13:49:36.629415] I [resource(slave master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-25 13:49:37.701642] I [resource(slave master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.0718
[2018-09-25 13:49:37.704282] I [resource(slave master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-25 14:10:27.70952] I [repce(slave master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-25 14:10:39.632124] W [gsyncd(slave master/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-25 14:10:39.650958] I [resource(slave master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-25 14:10:40.729355] I [resource(slave master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.0781
[2018-09-25 14:10:40.730650] I [resource(slave master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-25 14:31:10.291064] I [repce(slave master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-25 14:31:22.802237] W [gsyncd(slave master/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-25 14:31:22.828418] I [resource(slave master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-25 14:31:23.910206] I [resource(slave master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.0813
[2018-09-25 14:31:23.913369] I [resource(slave master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

Any ideas how to resolve this without re-creating everything again? Can I reset the changelog history?

Regards,
Christian

From: <gluster-users-***@gluster.org> on behalf of "Kotte, Christian (Ext)" <***@novartis.com>
Date: Monday, 24. September 2018 at 17:20
To: Kotresh Hiremath Ravishankar <***@redhat.com>
Cc: Gluster Users <gluster-***@gluster.org>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

I donât configure the permissions of /bricks/brick1/brick/.glusterfs. I donât even see it on the local GlusterFS mount.

Not sure why the permissions are configured with S and the AD groupâŠ

Regards,
Christian

From: <gluster-users-***@gluster.org> on behalf of "Kotte, Christian (Ext)" <***@novartis.com>
Date: Monday, 24. September 2018 at 17:10
To: Kotresh Hiremath Ravishankar <***@redhat.com>
Cc: Gluster Users <gluster-***@gluster.org>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

Yeah right. I get permission denied.

[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied
[***@slave ~]$ ll /bricks/brick1/brick/.glusterfs/
ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission denied

[***@slave ~]# ll /bricks/brick1/brick/.glusterfs/29
total 0
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb
[***@slave ~]#

However, the strange thing is that I could replicate new files and folders before. The replication is broken since the âNew folderâ was created.

These are the permissions on a dev/test system:
[***@slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/
total 3136
drwx------. 44 root root 4096 Aug 22 18:19 00
drwx------. 50 root root 4096 Sep 12 13:14 01
drwx------. 54 root root 4096 Sep 13 11:33 02
drwx------. 59 root root 4096 Aug 22 18:21 03
drwx------. 60 root root 4096 Sep 12 13:14 04
drwx------. 68 root root 4096 Aug 24 12:36 05
drwx------. 56 root root 4096 Aug 22 18:21 06
drwx------. 46 root root 4096 Aug 22 18:21 07
drwx------. 51 root root 4096 Aug 22 18:21 08
drwx------. 42 root root 4096 Aug 22 18:21 09
drwx------. 44 root root 4096 Sep 13 11:16 0a

Iâve configured an AD group, SGID bit, and ACLs via Ansible on the local mount point. Could this be an issue? Should I avoid configuring the permissions on .glusterfs and below?

# ll /mnt/glustervol1/
total 12
drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts
drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software
-rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test
drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp

glusterfs_volumes:
[âŠ]
permissions:
mode: "02775"
owner: root
group: "AD+group"
acl_permissions: rw
[âŠ]

# root directory is owned by root.
# set permissions to 'g+s' to automatically set the group to "AD+group"
# permissions of individual files will be set by Samba during creation
- name: Configure volume directory permission 1/2
tags: glusterfs
file:
path: /mnt/{{ item.volume }}
state: directory
mode: "{{ item.permissions.mode }}"
owner: "{{ item.permissions.owner }}"
group: "{{ item.permissions.group }}"
with_items: "{{ glusterfs_volumes }}"
loop_control:
label: "{{ item.volume }}"
when: item.permissions is defined

# ACL needs to be set to override default umask and grant "AD+group" write permissions
- name: Configure volume directory permission 2/2 (ACL)
tags: glusterfs
acl:
path: /mnt/{{ item.volume }}
default: yes
entity: "{{ item.permissions.group }}"
etype: group
permissions: "{{ item.permissions.acl_permissions }}"
state: present
with_items: "{{ glusterfs_volumes }}"
loop_control:
label: "{{ item.volume }}"
when: item.permissions is defined

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com>
Date: Monday, 24. September 2018 at 16:20
To: "Kotte, Christian (Ext)" <***@novartis.com>
Cc: Gluster Users <gluster-***@gluster.org>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

I think I am get what's happening. The geo-rep session is non-root.
Could you do readlink on brick path mentioned above /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
from a geaccount user and see if you are getting "Permission Denied" errors?

Thanks,
Kotresh HR

On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <***@novartis.com<mailto:***@novartis.com>> wrote:
Ok. It happens on all slave nodes (and on the interimmaster as well).

Itâs like I assumed. These are the logs of one of the slaves:

gsyncd.log
[2018-09-24 13:52:25.418382] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:37.95297] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:37.109643] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:38.303920] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1941
[2018-09-24 13:52:38.304771] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-24 13:52:41.981554] I [resource(slave slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
[2018-09-24 13:52:42.45641] E [repce(slave slave/bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops
src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in get_slv_dir_path
[ENOENT], [ESTALE])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap
return call(*arg)
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
[2018-09-24 13:52:42.81794] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-24 13:52:53.459676] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:53.473500] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-24 13:52:54.659044] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1854
[2018-09-24 13:52:54.659837] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

The folder âNew folderâ will be created via Samba and it was renamed by my colleague right away after creation.
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
[***@slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/ -al
total 0
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 .
drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

Creating the folder in /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, but it didnât change anything.

mnt-slave-bricks-brick1-brick.log
[2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0)
[2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
[2018-09-24 13:51:10.643507] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to set process-name in handshake msg
[2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-24 13:51:10.671460] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/bricks/brick1/brick'.
[2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-24 13:51:10.673329] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
[2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received signum (15), shutting down
[2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
[2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.

Regards,
Christian

From: Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>
Date: Saturday, 22. September 2018 at 06:52
To: "Kotte, Christian (Ext)" <***@novartis.com<mailto:***@novartis.com>>
Cc: Gluster Users <gluster-***@gluster.org<mailto:gluster-***@gluster.org>>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug.

On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <***@novartis.com<mailto:***@novartis.com>> wrote:
Hi,

Any idea how to troubleshoot this?

New folders and files were created on the master and the replication went faulty. They were created via Samba.

Version: GlusterFS 4.1.3

[***@master]# gluster volume geo-replication status

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_1::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@slave_2::glustervol1 N/A Faulty N/A N/A
master glustervol1 /bricks/brick1/brick geoaccount ssh://***@interimmaster::glustervol1 N/A Faulty N/A N/A

The following error is repeatedly logged in the gsyncd.logs:
[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=slave_3<http://nrchbs-slp2020.nibr.novartis.net>
[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1 from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 XATT=0
[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141, 0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__
raise res
OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but Iâm not sure if the syncdaemon runs under geoaccount!?

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

[***@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
[***@master vRealize Operation Manager]#

It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldnât investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this errorâŠ

Any help is appreciated.

Regards,

Christian Kotte
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>

--
Thanks and Regards,
Kotresh H R