[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error

Discussion:

Anh Vo

2018-07-03 18:54:12 UTC

I am trying to mount nfs to gluster volume and got mount.nfs failure.
Looking at nfs.log I am seeing these entries

Heal info does not show the mentioned gfid (
00000000-0000-0000-0000-000000000001 ) being in split-brain.

[2018-07-03 18:16:27.694953] W [MSGID: 112199]
[nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: c3ac3cc5,
FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:16:28.204685] W [MSGID: 112199]
[nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: c4ac3cc5,
FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
The message "E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-gv0-replicate-0: Failing
STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed.
[Input/output error]" repeated 2 times between [2018-07-03 18:16:27.694903]
and [2018-07-03 18:17:02.310689]
[2018-07-03 18:17:02.310722] W [MSGID: 112199]
[nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 2a6f2526,
FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:17:02.628990] E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-gv0-replicate-0: Failing
STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed.
[Input/output error]
[2018-07-03 18:17:02.629023] W [MSGID: 112199]
[nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 2b6f2526,
FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:17:00.398601] I [MSGID: 108031]
[afr-common.c:2458:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting
local read_child gv0-client-2
[2018-07-03 18:17:01.666671] W [MSGID: 108027]
[afr-common.c:2821:afr_discover_done] 0-gv0-replicate-0: no read subvols
for /
[2018-07-03 18:51:43.509385] W [MSGID: 108027]
[afr-common.c:2821:afr_discover_done] 0-gv0-replicate-0: no read subvols
for /
[2018-07-03 18:51:43.936826] E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-gv0-replicate-0: Failing
STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed.
[Input/output error]
[2018-07-03 18:51:43.936868] W [MSGID: 112199]
[nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 19b1731e,
FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:51:44.278901] W [MSGID: 112199]
[nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 1ab1731e,
FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))

Anh Vo

2018-07-03 20:17:18 UTC

Permalink

Actually we just discovered that the heal info command was returning
different things when executed on the different nodes of our 3-replica
setup.
When we execute it on node2 we did not see the split brain reported "/" but
if I execute it on node0 and node1 I am seeing:

***@gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info
Brick gfs-vm000:/gluster/brick/brick0
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
/ - Is in split-brain

Status: Connected
Number of entries: 2

Brick gfs-vm001:/gluster/brick/brick0
/ - Is in split-brain

<gfid:81289110-867b-42ff-ba3b-1373a187032b>
Status: Connected
Number of entries: 2

Brick gfs-vm002:/gluster/brick/brick0
/ - Is in split-brain

Status: Connected
Number of entries: 1

I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes and
I am seeing node2 has slightly different attr:
node0:
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2

node1:
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2

node2:
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000200000000
trusted.afr.gv0-client-1=0x000000000000000200000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2

Where do I go from here? Thanks

Post by Anh Vo
I am trying to mount nfs to gluster volume and got mount.nfs failure.
Looking at nfs.log I am seeing these entries
Heal info does not show the mentioned gfid ( 00000000-0000-0000-0000-
000000000001 ) being in split-brain.
[2018-07-03 18:16:27.694953] W [MSGID: 112199]
c3ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:16:28.204685] W [MSGID: 112199]
c4ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
The message "E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done]
split-brain observed. [Input/output error]" repeated 2 times between
[2018-07-03 18:16:27.694903] and [2018-07-03 18:17:02.310689]
[2018-07-03 18:17:02.310722] W [MSGID: 112199]
2a6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:17:02.628990] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done]
split-brain observed. [Input/output error]
[2018-07-03 18:17:02.629023] W [MSGID: 112199]
2b6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:17:00.398601] I [MSGID: 108031]
[afr-common.c:2458:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting
local read_child gv0-client-2
[2018-07-03 18:17:01.666671] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done]
0-gv0-replicate-0: no read subvols for /
[2018-07-03 18:51:43.509385] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done]
0-gv0-replicate-0: no read subvols for /
[2018-07-03 18:51:43.936826] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done]
split-brain observed. [Input/output error]
[2018-07-03 18:51:43.936868] W [MSGID: 112199]
19b1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-07-03 18:51:44.278901] W [MSGID: 112199]
1ab1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error))

Ravishankar N

2018-07-04 03:02:12 UTC

Permalink

Hi,

What version of gluster are you using?

1. The afr xattrs on '/' indicate a meta-data split-brain. You can
resolve it using one of the policies listed in
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/

For example, "|gluster volume heal gv0 split-brain latest-mtime / "
|

2. Is the file corresponding to the other gfid
(81289110-867b-42ff-ba3b-1373a187032b) present in all bricks? What do
the getfattr outputs for this file indicate?

3. As for the discrepancy in output of heal info, is node2 connected to
the other nodes? Does heal info still print the details of all 3 bricks
when you run it on node2 ?
-Ravi

Post by Anh Vo
Actually we just discovered that the heal info command was returning
different things when executed on the different nodes of our 3-replica
setup.
When we execute it on node2 we did not see the split brain reported
Brick gfs-vm000:/gluster/brick/brick0
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
/ - Is in split-brain
Status: Connected
Number of entries: 2
Brick gfs-vm001:/gluster/brick/brick0
/ - Is in split-brain
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
Status: Connected
Number of entries: 2
Brick gfs-vm002:/gluster/brick/brick0
/ - Is in split-brain
Status: Connected
Number of entries: 1
I ranÂ getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000200000000
trusted.afr.gv0-client-1=0x000000000000000200000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
Where do I go from here? Thanks

Anh Vo

2018-07-04 15:45:52 UTC

Permalink

If I run "sudo gluster volume heal gv0 split-brain latest-mtime /" I get
the following:

Lookup failed on /:Invalid argument.
Volume heal failed.

node2 was not connected at that time, because if we connect it to the
system after a few minutes gluster will become almost unusable and we have
many jobs failing. This morning I reconnected it and ran heal info and we
have about 30000 entries to heal (15K from gfs-vm000 and 15k from
gfs-vm001, 80% are all gfid, 20% have file names). It's not feasible for us
to check the individual gfid so we kinda rely on gluster self heal to
handle those gfid. The "/" is a concern because it prevents us from
mounting nfs. We do need to mount nfs for some of our management because
gluster fuse mount is much slower compared to nfs when it comes to
recursive operations like 'du'

Do you have any suggestion for healing the metadata on '/' ?

Thanks
Anh

Post by Ravishankar N
Hi,
What version of gluster are you using?
1. The afr xattrs on '/' indicate a meta-data split-brain. You can resolve
it using one of the policies listed in https://docs.gluster.org/en/
latest/Troubleshooting/resolving-splitbrain/
For example, "gluster volume heal gv0 split-brain latest-mtime / "
2. Is the file corresponding to the other gfid (81289110-867b-42ff-ba3b-1373a187032b)
present in all bricks? What do the getfattr outputs for this file indicate?
3. As for the discrepancy in output of heal info, is node2 connected to
the other nodes? Does heal info still print the details of all 3 bricks
when you run it on node2 ?
-Ravi
Actually we just discovered that the heal info command was returning
different things when executed on the different nodes of our 3-replica
setup.
When we execute it on node2 we did not see the split brain reported "/"
Brick gfs-vm000:/gluster/brick/brick0
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
/ - Is in split-brain
Status: Connected
Number of entries: 2
Brick gfs-vm001:/gluster/brick/brick0
/ - Is in split-brain
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
Status: Connected
Number of entries: 2
Brick gfs-vm002:/gluster/brick/brick0
/ - Is in split-brain
Status: Connected
Number of entries: 1
I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes and
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000200000000
trusted.afr.gv0-client-1=0x000000000000000200000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
Where do I go from here? Thanks

Anh Vo

2018-07-04 15:50:08 UTC

Permalink

I forgot to mention we're using 3.12.10

Post by Anh Vo
If I run "sudo gluster volume heal gv0 split-brain latest-mtime /" I get
Lookup failed on /:Invalid argument.
Volume heal failed.
node2 was not connected at that time, because if we connect it to the
system after a few minutes gluster will become almost unusable and we have
many jobs failing. This morning I reconnected it and ran heal info and we
have about 30000 entries to heal (15K from gfs-vm000 and 15k from
gfs-vm001, 80% are all gfid, 20% have file names). It's not feasible for us
to check the individual gfid so we kinda rely on gluster self heal to
handle those gfid. The "/" is a concern because it prevents us from
mounting nfs. We do need to mount nfs for some of our management because
gluster fuse mount is much slower compared to nfs when it comes to
recursive operations like 'du'
Do you have any suggestion for healing the metadata on '/' ?
Thanks
Anh

Post by Ravishankar N
Hi,
What version of gluster are you using?
1. The afr xattrs on '/' indicate a meta-data split-brain. You can
resolve it using one of the policies listed in
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
For example, "gluster volume heal gv0 split-brain latest-mtime / "
2. Is the file corresponding to the other gfid
(81289110-867b-42ff-ba3b-1373a187032b) present in all bricks? What do
the getfattr outputs for this file indicate?
3. As for the discrepancy in output of heal info, is node2 connected to
the other nodes? Does heal info still print the details of all 3 bricks
when you run it on node2 ?
-Ravi
Actually we just discovered that the heal info command was returning
different things when executed on the different nodes of our 3-replica
setup.
When we execute it on node2 we did not see the split brain reported "/"
Brick gfs-vm000:/gluster/brick/brick0
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
/ - Is in split-brain
Status: Connected
Number of entries: 2
Brick gfs-vm001:/gluster/brick/brick0
/ - Is in split-brain
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
Status: Connected
Number of entries: 2
Brick gfs-vm002:/gluster/brick/brick0
/ - Is in split-brain
Status: Connected
Number of entries: 1
I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000200000000
trusted.afr.gv0-client-1=0x000000000000000200000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
Where do I go from here? Thanks

Ravishankar N

2018-07-04 16:01:41 UTC

Permalink

Post by Anh Vo
I forgot to mention we're using 3.12.10
If I run "sudo gluster volume heal gv0 split-brain latest-mtime /"
Lookup failed on /:Invalid argument.
Volume heal failed.

Can you share the glfsheal-<volname>.log on the node where you ran this
failed command?

Post by Anh Vo
node2 was not connected at that time, because if we connect it to
the system after a few minutes gluster will become almost unusable
and we have many jobs failing. This morning I reconnected it and
ran heal info and we have about 30000 entries to heal (15K from
gfs-vm000 and 15k from gfs-vm001, 80% are all gfid, 20% have file
names). It's not feasible for us to check the individual gfid so
we kinda rely on gluster self heal to handle those gfid. The "/"
is a concern because it prevents us from mounting nfs. We do need
to mount nfs for some of our management because gluster fuse mount
is much slower compared to nfs when it comes to recursive
operations like 'du'
Do you have any suggestion for healing the metadata on '/' ?

You can manually delete the afr xattrs on node 3 as a workaround:
setfattr -x trusted.afr.gv0-client-0 gluster/brick/brick0
setfattr -x trusted.afr.gv0-client-1 gluster/brick/brick0

This should remove the split-brain on root.

HTH,
Ravi

Post by Anh Vo
Thanks
Anh
On Tue, Jul 3, 2018 at 8:02 PM, Ravishankar N
Hi,
What version of gluster are you using?
1. The afr xattrs on '/' indicate a meta-data split-brain. You
can resolve it using one of the policies listed in
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
<https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/>
For example, "|gluster volume heal gv0 split-brain
latest-mtime / "
|
2. Is the file corresponding to the other gfid
(81289110-867b-42ff-ba3b-1373a187032b) present in all bricks?
What do the getfattr outputs for this file indicate?
3. As for the discrepancy in output of heal info, is node2
connected to the other nodes? Does heal info still print the
details of all 3 bricks when you run it on node2 ?
-Ravi

Post by Anh Vo
Actually we just discovered that the heal info command was
returning different things when executed on the different
nodes of our 3-replica setup.
When we execute it on node2 we did not see the split brain
Brick gfs-vm000:/gluster/brick/brick0
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
/ - Is in split-brain
Status: Connected
Number of entries: 2
Brick gfs-vm001:/gluster/brick/brick0
/ - Is in split-brain
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
Status: Connected
Number of entries: 2
Brick gfs-vm002:/gluster/brick/brick0
/ - Is in split-brain
Status: Connected
Number of entries: 1
I ranÂ getfattr -d -m . -e hex /gluster/brick/brick0 on all
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000200000000
trusted.afr.gv0-client-1=0x000000000000000200000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
Where do I go from here? Thanks

Anh Vo

2018-07-04 16:19:56 UTC

Permalink

Output of glfsheal-gv0.log:

[2018-07-04 16:11:05.435680] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk]
0-gv0-client-1: Server lk version = 1
[2018-07-04 16:11:05.436847] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-gv0-client-2: changing port to 49153 (from 0)
[2018-07-04 16:11:05.437722] W [MSGID: 114007]
[client-handshake.c:1190:client_setvolume_cbk]
0-gv0-client-0: failed to find key 'child_up' in the options
[2018-07-04 16:11:05.437744] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk]
0-gv0-client-0: Connected to gv0-client-0, attached to remote volume
'/gluster/brick/brick0'.
[2018-07-04 16:11:05.437755] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk]
0-gv0-client-0: Server and Client lk-version numbers are not same,
reopening the fds
[2018-07-04 16:11:05.531514] I [MSGID: 108002]
[afr-common.c:5312:afr_notify] 0-gv0-replicate-0: Client-quorum is met
[2018-07-04 16:11:05.531550] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk]
0-gv0-client-0: Server lk version = 1
[2018-07-04 16:11:05.532115] I [MSGID: 114057] [client-handshake.c:1478:
select_server_supported_programs] 0-gv0-client-2: Using Program GlusterFS
3.3, Num (1298437), Version (330)
[2018-07-04 16:11:05.537528] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk]
0-gv0-client-2: Connected to gv0-client-2, attached to remote volume
'/gluster/brick/brick0'.
[2018-07-04 16:11:05.537569] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk]
0-gv0-client-2: Server and Client lk-version numbers are not same,
reopening the fds
[2018-07-04 16:11:05.544248] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk]
0-gv0-client-2: Server lk version = 1
[2018-07-04 16:11:05.547665] I [MSGID: 108031]
[afr-common.c:2458:afr_local_discovery_cbk]
0-gv0-replicate-0: selecting local read_child gv0-client-1
[2018-07-04 16:11:05.556948] W [MSGID: 108027]
[afr-common.c:2821:afr_discover_done]
0-gv0-replicate-0: no read subvols for /
[2018-07-04 16:11:05.577751] W [MSGID: 108027]
[afr-common.c:2821:afr_discover_done]
0-gv0-replicate-0: no read subvols for /
[2018-07-04 16:11:05.577839] I [MSGID: 104041]
[glfs-resolve.c:971:__glfs_active_subvol]
0-gv0: switched to graph 6766732d-766d-3030-312d-37373932362d (0)
[2018-07-04 16:11:05.578355] W [MSGID: 114031]
[client-rpc-fops.c:2860:client3_3_lookup_cbk]
0-gv0-client-1: remote operation failed. Path: /
(00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2018-07-04 16:11:05.579562] W [MSGID: 114031]
[client-rpc-fops.c:2860:client3_3_lookup_cbk]
0-gv0-client-0: remote operation failed. Path: /
(00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2018-07-04 16:11:05.579776] W [MSGID: 114031]
[client-rpc-fops.c:2860:client3_3_lookup_cbk]
0-gv0-client-2: remote operation failed. Path: /
(00000000-0000-0000-0000-000000000000)
[Invalid argument]

Removing the afr xattrs on node 3 did solve the split brain issue on root.
Thank you!

Post by Anh Vo
I forgot to mention we're using 3.12.10

Post by Anh Vo
If I run "sudo gluster volume heal gv0 split-brain latest-mtime /" I get
Lookup failed on /:Invalid argument.
Volume heal failed.

Can you share the glfsheal-<volname>.log on the node where you ran this
failed command?

Post by Anh Vo
node2 was not connected at that time, because if we connect it to the
system after a few minutes gluster will become almost unusable and we have
many jobs failing. This morning I reconnected it and ran heal info and we
have about 30000 entries to heal (15K from gfs-vm000 and 15k from
gfs-vm001, 80% are all gfid, 20% have file names). It's not feasible for us
to check the individual gfid so we kinda rely on gluster self heal to
handle those gfid. The "/" is a concern because it prevents us from
mounting nfs. We do need to mount nfs for some of our management because
gluster fuse mount is much slower compared to nfs when it comes to
recursive operations like 'du'
Do you have any suggestion for healing the metadata on '/' ?

setfattr -x trusted.afr.gv0-client-0 gluster/brick/brick0
setfattr -x trusted.afr.gv0-client-1 gluster/brick/brick0
This should remove the split-brain on root.
HTH,
Ravi

Post by Anh Vo
Thanks
Anh

Post by Ravishankar N
Hi,
What version of gluster are you using?
1. The afr xattrs on '/' indicate a meta-data split-brain. You can
resolve it using one of the policies listed in
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
For example, "gluster volume heal gv0 split-brain latest-mtime / "
2. Is the file corresponding to the other gfid
(81289110-867b-42ff-ba3b-1373a187032b) present in all bricks? What do
the getfattr outputs for this file indicate?
3. As for the discrepancy in output of heal info, is node2 connected to
the other nodes? Does heal info still print the details of all 3 bricks
when you run it on node2 ?
-Ravi
Actually we just discovered that the heal info command was returning
different things when executed on the different nodes of our 3-replica
setup.
When we execute it on node2 we did not see the split brain reported "/"
Brick gfs-vm000:/gluster/brick/brick0
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
/ - Is in split-brain
Status: Connected
Number of entries: 2
Brick gfs-vm001:/gluster/brick/brick0
/ - Is in split-brain
<gfid:81289110-867b-42ff-ba3b-1373a187032b>
Status: Connected
Number of entries: 2
Brick gfs-vm002:/gluster/brick/brick0
/ - Is in split-brain
Status: Connected
Number of entries: 1
I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.gv0-client-2=0x000000000000000100000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
sudo getfattr -d -m . -e hex /gluster/brick/brick0
getfattr: Removing leading '/' from absolute path names
# file: gluster/brick/brick0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000200000000
trusted.afr.gv0-client-1=0x000000000000000200000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
Where do I go from here? Thanks