Discussion:
[Gluster-users] New 3.12.7 possible split-brain on replica 3
mabi
2018-04-09 07:37:07 UTC
Permalink
Hello,

Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.

All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.

I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.

For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.

What can I do to fix that issue?

Best regards,
Mabi


[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: <gfid:13880e8c-13da-442f-8180-fa40b6f5327c> (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: <gfid:13880e8c-13da-442f-8180-fa40b6f5327c> (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
mabi
2018-04-09 07:49:26 UTC
Permalink
Here would be also the corresponding log entries on a gluster node brick log file:

[2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory]

Hope that helps to find out the issue.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
​​
Hello,
Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.
All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.
I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.
For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.
What can I do to fix that issue?
Best regards,
Mabi
[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-04-09 11:06:04 UTC
Permalink
As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below:

NODE1:

STAT:
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==

NODE2:

STAT:
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==

NODE3:
STAT:
File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 404058268 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
Birth: -

GETFATTR:
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA
trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA
trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==

CLIENT GLUSTER MOUNT:
STAT:
File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
Size: 0 Blocks: 0 IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371 Links: 1
Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
​​
[2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory]
Hope that helps to find out the issue.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hello,
Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.
All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.
I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.
For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.
What can I do to fix that issue?
Best regards,
Mabi
[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-04-09 11:24:39 UTC
Permalink
Post by mabi
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 404058268 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
Birth: -
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA
trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA
Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
launch heal, that should fix it. But the file seems to have no content
on both data bricks as well, so you might want to check if that was
expected.

-Ravi
Post by mabi
trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
Size: 0 Blocks: 0 IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371 Links: 1
Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
​​
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
​​
[2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory]
Hope that helps to find out the issue.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hello,
Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.
All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.
I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.
For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.
What can I do to fix that issue?
Best regards,
Mabi
[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-04-09 11:39:41 UTC
Permalink
Thanks Ravi for your answer.

Stupid question but how do I delete the trusted.afr xattrs on this brick?

And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)?
​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by mabi
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 404058268 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
Birth: -
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA
trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA
Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
launch heal, that should fix it. But the file seems to have no content
on both data bricks as well, so you might want to check if that was
expected.
-Ravi
Post by mabi
trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
Size: 0 Blocks: 0 IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371 Links: 1
Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
[2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory]
Hope that helps to find out the issue.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hello,
Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.
All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.
I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.
For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.
What can I do to fix that issue?
Best regards,
Mabi
[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-04-09 11:46:37 UTC
Permalink
Post by mabi
Thanks Ravi for your answer.
Stupid question but how do I delete the trusted.afr xattrs on this brick?
And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)?
Sorry I should have been clearer. Yes the brick on the 3rd node.

`setfattr -x trusted.afr.myvol-private-client-0
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`

`setfattr -x trusted.afr.myvol-private-client-1
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`

After doing this for all files, run 'gluster volume heal <volname>`.

HTH,
Ravi
Post by mabi
​​
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by mabi
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 404058268 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
Birth: -
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA
trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA
Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
launch heal, that should fix it. But the file seems to have no content
on both data bricks as well, so you might want to check if that was
expected.
-Ravi
Post by mabi
trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
Size: 0 Blocks: 0 IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371 Links: 1
Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
[2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory]
Hope that helps to find out the issue.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hello,
Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.
All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.
I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.
For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.
What can I do to fix that issue?
Best regards,
Mabi
[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-04-09 12:10:04 UTC
Permalink
Again thanks that worked and I have now no more unsynched files.

You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13.


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by mabi
Thanks Ravi for your answer.
Stupid question but how do I delete the trusted.afr xattrs on this brick?
And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)?
Sorry I should have been clearer. Yes the brick on the 3rd node.
`setfattr -x trusted.afr.myvol-private-client-0 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
`setfattr -x trusted.afr.myvol-private-client-1 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
After doing this for all files, run 'gluster volume heal <volname>`.
HTH,
Ravi
Post by mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 404058268 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
Birth: -
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA
trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA
Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
launch heal, that should fix it. But the file seems to have no content
on both data bricks as well, so you might want to check if that was
expected.
-Ravi
Post by mabi
trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
Size: 0 Blocks: 0 IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371 Links: 1
Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
[2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory]
Hope that helps to find out the issue.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hello,
Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.
All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.
I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.
For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.
What can I do to fix that issue?
Best regards,
Mabi
[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-04-09 12:18:19 UTC
Permalink
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13.
I don't think there will be another 3.12 release. Adding Karthik to see
if this is something he can back-port to 3.10.12, which is likely to be
out by the end of the month.
-Ravi
Post by mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by mabi
Thanks Ravi for your answer.
Stupid question but how do I delete the trusted.afr xattrs on this brick?
And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)?
Sorry I should have been clearer. Yes the brick on the 3rd node.
`setfattr -x trusted.afr.myvol-private-client-0 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
`setfattr -x trusted.afr.myvol-private-client-1 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
After doing this for all files, run 'gluster volume heal <volname>`.
HTH,
Ravi
Post by mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
Birth: -
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 404058268 Links: 2
Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
Birth: -
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA
trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA
Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
launch heal, that should fix it. But the file seems to have no content
on both data bricks as well, so you might want to check if that was
expected.
-Ravi
Post by mabi
trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ==
File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
Size: 0 Blocks: 0 IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371 Links: 1
Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
Birth: -
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
[2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory]
Hope that helps to find out the issue.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hello,
Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically.
All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else.
I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty.
For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly.
What can I do to fix that issue?
Best regards,
Mabi
[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1
[2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1
[2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null)
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Dmitry Melekhov
2018-04-09 12:24:55 UTC
Permalink
Post by Ravishankar N
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
Ravishankar N
2018-04-09 12:31:17 UTC
Permalink
Post by Dmitry Melekhov
Post by Ravishankar N
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.

[1] https://www.gluster.org/release-schedule/
Post by Dmitry Melekhov
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-04-09 14:22:05 UTC
Permalink
Thanks for clarifying that point. Does this mean that the fix for this bug will get backported to the next 3.12 release?
​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by Dmitry Melekhov
Post by Ravishankar N
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Post by Dmitry Melekhov
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-05-15 07:08:19 UTC
Permalink
Dear all,

I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.

It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...

Thank you very much in advance for your advise.

Best regards,
Mabi​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by Dmitry Melekhov
Post by Ravishankar N
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Post by Dmitry Melekhov
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-05-15 08:52:46 UTC
Permalink
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi​​
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by Dmitry Melekhov
Post by Ravishankar N
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Post by Dmitry Melekhov
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-05-15 11:56:44 UTC
Permalink
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.

NODE 1:

File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -

File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -

# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001

# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001


NODE 2:

File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -

File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -

# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001

# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001


NODE 3 (arbiter):

File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -

File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -

# file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001

# file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001


​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Post by mabi
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-05-17 05:00:58 UTC
Permalink
Hi mabi,
Some questions:
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
# file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
# file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
# file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
​​
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Post by mabi
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-05-17 21:07:51 UTC
Permalink
Hi Ravi,

Please fine below the answers to your questions

1) I have never touched the cluster.quorum-type option. Currently it is set as following for this volume:

Option Value
------ -----
cluster.quorum-type none

2) The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.

3) It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).

Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
​​
NODE 1 brick log:

[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]

NODE 2 brick log:

[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]

NODE 3 (arbiter) brick log:

[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]


Best regards,
Mabi

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
​​
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-05-23 07:17:24 UTC
Permalink
Hello,

I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?

For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?

Thanks,
Mabi


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
​​
Hi Ravi,
Please fine below the answers to your questions
Option Value
cluster.quorum-type none
2) The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
3) It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-05-23 07:25:01 UTC
Permalink
Post by mabi
Hello,
I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?
For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The
workaround of resetting xattrs should be fine though.
Thanks,
Ravi
Post by mabi
Thanks,
Mabi

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
​​
Hi Ravi,
Please fine below the answers to your questions
Option Value
cluster.quorum-type none
2) The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
3) It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-05-25 15:55:18 UTC
Permalink
Thanks Ravi. Let me know when you have time to have a look. It sort of happens around once or twice per week but today it was 24 files in one go which are unsynched and where I need to manually reset the xattrs on the arbiter node.

By the way on this volume I use quotas which I set on specifc directories, I don't know if this is relevant or not but thought I would just mention.
​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Post by mabi
Hello,
I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?
For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The
workaround of resetting xattrs should be fine though.
Thanks,
Ravi
Post by mabi
Thanks,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Ravi,
Please fine below the answers to your questions
Option Value
cluster.quorum-type none
2. The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
3. It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-06-22 14:44:35 UTC
Permalink
Hi,

Now that this issue has happened a few times I noticed a few things which might be helpful for debugging:

- This problem happens when files are uploaded via a cloud app called Nextcloud where the files are encrypted by the app itself on the server side (PHP code) but only rarely and randomly.
- It does not seem to happen with Nextcloud installation which does not have server side encryption enabled.
- When this happens both first and second node of the replica have 120k of context switches and 25k interrupts, the arbiter node 30k context switches/20k interrupts. No nodes are overloaded, there is no io/wait and no network issues or disconnections.
- All of the problematic files to heal have spaces in one of their sub-directories (might be totally irrelevant).

If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running.

Regards,
M.​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
​​
Thanks Ravi. Let me know when you have time to have a look. It sort of happens around once or twice per week but today it was 24 files in one go which are unsynched and where I need to manually reset the xattrs on the arbiter node.
By the way on this volume I use quotas which I set on specifc directories, I don't know if this is relevant or not but thought I would just mention.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Hello,
I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?
For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The
workaround of resetting xattrs should be fine though.
Thanks,
Ravi
Post by mabi
Thanks,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Ravi,
Please fine below the answers to your questions
Option Value
cluster.quorum-type none
2. The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
3. It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-07-04 13:21:34 UTC
Permalink
Hello,

I just wanted to let you know that last week I have upgraded my two replica nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are running Debian 9 with a Linux 4 kernel.

Unfortunately I still have the exact same issue. Another detail I might have not mentioned yet is that I have quotas enabled on this volume, I don't really know if that is relevant but who knows...

As a reminder here is what happens on the client side which has the volume mounted via FUSE (take earlier today from the /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this specific case it's only one single file who had this issue.

[2018-07-04 08:23:49.314252] E [MSGID: 109089] [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: failed to open the fd (0x7fccb00a5120, flags=0100000) on file /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey @ myvol-replicate-0 [Input/output error]
[2018-07-04 08:23:49.328712] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
[2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 55916791: TRUNCATE() /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey => -1 (Input/output error)

Best regards,
M.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
​​
Hi,
- This problem happens when files are uploaded via a cloud app called Nextcloud where the files are encrypted by the app itself on the server side (PHP code) but only rarely and randomly.
- It does not seem to happen with Nextcloud installation which does not have server side encryption enabled.
- When this happens both first and second node of the replica have 120k of context switches and 25k interrupts, the arbiter node 30k context switches/20k interrupts. No nodes are overloaded, there is no io/wait and no network issues or disconnections.
- All of the problematic files to heal have spaces in one of their sub-directories (might be totally irrelevant).
If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running.
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Thanks Ravi. Let me know when you have time to have a look. It sort of happens around once or twice per week but today it was 24 files in one go which are unsynched and where I need to manually reset the xattrs on the arbiter node.
By the way on this volume I use quotas which I set on specifc directories, I don't know if this is relevant or not but thought I would just mention.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Hello,
I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?
For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The
workaround of resetting xattrs should be fine though.
Thanks,
Ravi
Post by mabi
Thanks,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Ravi,
Please fine below the answers to your questions
Option Value
cluster.quorum-type none
2. The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
3. It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-07-04 15:41:20 UTC
Permalink
Hi mabi, there are a couple of AFR patches  from master that I'm
currently back porting to the 3.12 branch:

afr: heal gfids when file is not present on all bricks
afr: don't update readables if inode refresh failed on all children
afr: fix bug-1363721.t failure
afr: add quorum checks in pre-op
afr: don't treat all cases all bricks being blamed as split-brain
afr: capture the correct errno in post-op quorum check
afr: add quorum checks in post-op

Many of these help make the transaction code more robust by fixing
various corner cases. It would be great if you can wait for the next
3.12 minor release (3.12.12 ?) and upgrade to that build and see if the
issues go away.

Note: CC'ing Karthik and Jiffin for their help in reviewing and merging
the backports for the above patches.

Thanks,
Ravi
Post by mabi
Hello,
I just wanted to let you know that last week I have upgraded my two replica nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are running Debian 9 with a Linux 4 kernel.
Unfortunately I still have the exact same issue. Another detail I might have not mentioned yet is that I have quotas enabled on this volume, I don't really know if that is relevant but who knows...
As a reminder here is what happens on the client side which has the volume mounted via FUSE (take earlier today from the /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this specific case it's only one single file who had this issue.
[2018-07-04 08:23:49.328712] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
[2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 55916791: TRUNCATE() /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey => -1 (Input/output error)
Best regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
​​
Hi,
- This problem happens when files are uploaded via a cloud app called Nextcloud where the files are encrypted by the app itself on the server side (PHP code) but only rarely and randomly.
- It does not seem to happen with Nextcloud installation which does not have server side encryption enabled.
- When this happens both first and second node of the replica have 120k of context switches and 25k interrupts, the arbiter node 30k context switches/20k interrupts. No nodes are overloaded, there is no io/wait and no network issues or disconnections.
- All of the problematic files to heal have spaces in one of their sub-directories (might be totally irrelevant).
If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running.
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Thanks Ravi. Let me know when you have time to have a look. It sort of happens around once or twice per week but today it was 24 files in one go which are unsynched and where I need to manually reset the xattrs on the arbiter node.
By the way on this volume I use quotas which I set on specifc directories, I don't know if this is relevant or not but thought I would just mention.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Hello,
I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?
For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The
workaround of resetting xattrs should be fine though.
Thanks,
Ravi
Post by mabi
Thanks,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Ravi,
Please fine below the answers to your questions
Option Value
cluster.quorum-type none
2. The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
3. It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-07-05 07:19:09 UTC
Permalink
Dear Ravi,

Thank you for your mail and info.

This is great news if these patches can make it into 3.12.12. I will then upgrade asap. Could anyone confirm in case these patches do not make it into 3.12.12? Because I would then rather wait for the next release. I was already told on this list that 3.12.9 should have fixed this issue but unfortunately it didn't.

Best regards,
Mabi​​

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
​​
Hi mabi, there are a couple of AFR patches  from master that I'm
afr: heal gfids when file is not present on all bricks
afr: don't update readables if inode refresh failed on all children
afr: fix bug-1363721.t failure
afr: add quorum checks in pre-op
afr: don't treat all cases all bricks being blamed as split-brain
afr: capture the correct errno in post-op quorum check
afr: add quorum checks in post-op
Many of these help make the transaction code more robust by fixing
various corner cases. It would be great if you can wait for the next
3.12 minor release (3.12.12 ?) and upgrade to that build and see if the
issues go away.
Note: CC'ing Karthik and Jiffin for their help in reviewing and merging
the backports for the above patches.
Thanks,
Ravi
Post by mabi
Hello,
I just wanted to let you know that last week I have upgraded my two replica nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are running Debian 9 with a Linux 4 kernel.
Unfortunately I still have the exact same issue. Another detail I might have not mentioned yet is that I have quotas enabled on this volume, I don't really know if that is relevant but who knows...
As a reminder here is what happens on the client side which has the volume mounted via FUSE (take earlier today from the /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this specific case it's only one single file who had this issue.
[2018-07-04 08:23:49.328712] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
[2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 55916791: TRUNCATE() /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey => -1 (Input/output error)
Best regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
- This problem happens when files are uploaded via a cloud app called Nextcloud where the files are encrypted by the app itself on the server side (PHP code) but only rarely and randomly.
- It does not seem to happen with Nextcloud installation which does not have server side encryption enabled.
- When this happens both first and second node of the replica have 120k of context switches and 25k interrupts, the arbiter node 30k context switches/20k interrupts. No nodes are overloaded, there is no io/wait and no network issues or disconnections.
- All of the problematic files to heal have spaces in one of their sub-directories (might be totally irrelevant).
If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running.
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Thanks Ravi. Let me know when you have time to have a look. It sort of happens around once or twice per week but today it was 24 files in one go which are unsynched and where I need to manually reset the xattrs on the arbiter node.
By the way on this volume I use quotas which I set on specifc directories, I don't know if this is relevant or not but thought I would just mention.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Hello,
I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?
For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The
workaround of resetting xattrs should be fine though.
Thanks,
Ravi
Post by mabi
Thanks,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Ravi,
Please fine below the answers to your questions
Option Value
cluster.quorum-type none
2. The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
3. It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Hi mabi,
-Did you by any chance change the cluster.quorum-type option from the
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file
was fallocated with the keep-size option but never written to. (On the 2
data bricks, stat output shows Size =0, but non zero Blocks and yet a
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi
Post by mabi
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
Size: 0 Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
Birth: -
File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
Size: 8 Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
Birth: -
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
========================================================================================================================
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
=======================================================================================================
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: ca11h/51729d Inode: 271434295 Links: 2
Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
Birth: -
File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: ca11h/51729d Inode: 271434288 Links: 2
Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
Birth: -
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
=================================================================================================================================
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.myvolume-private-client-0=0x000000010000000000000000
trusted.afr.myvolume-private-client-1=0x000000010000000000000000
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
================================================================================================================
trusted.afr.myvolume-private-client-0=0x000000000000000000000000
trusted.afr.myvolume-private-client-1=0x000000000000000000000000
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Ravishankar N
Post by mabi
Dear all,
I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex
brick/path/to/file' outputs of one of these files and also the
corresponding parent directory from all 3 bricks?
Thanks,
Ravi
Post by mabi
Thank you very much in advance for your advise.
Best regards,
Mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Again thanks that worked and I have now no more unsynched files.
You mentioned that this bug has been fixed in 3.13, would it be
possible to backport it to 3.12? I am asking because 3.13 is not a
long-term release and as such I would not like to have to upgrade to
3.13.
I don't think there will be another 3.12 release.
Why not? It is LTS, right?
My bad. Just checked  the schedule [1], and you are right. It is LTM.
[1] https://www.gluster.org/release-schedule/
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Continue reading on narkive:
Loading...