Discussion:
[Gluster-users] Files from one brick missing from readdir
Hans Henrik Happe
2018-07-09 07:21:12 UTC
Permalink
Hi,

After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed to go smoothly,
we have experienced missing files and dirs when listing directories.

We are using a distributed setup with 20 bricks (no redundance from
glusterfs).

The dirs and files can be referenced directly, but does not show up in
listings (readdir, i.e. ls). Renaming them works, but they still does
not show up.

The first time we discovered this, we noticed that files slowly
reappeared and finally all were there. After that we started a
fix-layout which is still running (5mio dirs). After this we would
compare brick files to the mounted fs.

Yesterday we again discovered some missing files in a dir. After some
poking around we found that all missing files were located on the same
brick.

Comparing dir xattr did not give us a clue:


Brick with missing files:

# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e340000000000000000750000000000000001

Other brick:

# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa80000000000000000700000000000000001


Anyone who experienced this or have some clues to what might be wrong?

Cheers,
Hans Henrik
Nithya Balachandran
2018-07-09 08:00:23 UTC
Permalink
Hi Hans,

Another user has reported something similar and we are still debugging this.

Would you mind taking a tcpdump of the client while listing the directory
from a FUSE client and sending it to me? Please use
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22


Also, please send the output of gluster volume info and gluster volume get
<volname> all.

Thanks,
Nithya
Post by Hans Henrik Happe
Hi,
After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed to go smoothly,
we have experienced missing files and dirs when listing directories.
We are using a distributed setup with 20 bricks (no redundance from
glusterfs).
The dirs and files can be referenced directly, but does not show up in
listings (readdir, i.e. ls). Renaming them works, but they still does
not show up.
The first time we discovered this, we noticed that files slowly
reappeared and finally all were there. After that we started a
fix-layout which is still running (5mio dirs). After this we would
compare brick files to the mounted fs.
Yesterday we again discovered some missing files in a dir. After some
poking around we found that all missing files were located on the same
brick.
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=
0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e3400000000000000
00750000000000000001
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=
0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa800000000000000
00700000000000000001
Anyone who experienced this or have some clues to what might be wrong?
Cheers,
Hans Henrik
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Niels de Vos
2018-07-09 08:50:39 UTC
Permalink
Post by Nithya Balachandran
Hi Hans,
Another user has reported something similar and we are still debugging this.
Would you mind taking a tcpdump of the client while listing the directory
from a FUSE client and sending it to me? Please use
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
Last week I came across a patch that fixes a bug in the FUSE kernel
module:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6cdd51404b7ac12dd95173ddfc548c59ecf037f

This was included in linux-4.14 but has been marked for backporting to
older kernels as well. If the description matches the experienced
behaviour, running with an updates (or patched) kernel may help.

Niels
Post by Nithya Balachandran
Also, please send the output of gluster volume info and gluster volume get
<volname> all.
Thanks,
Nithya
Post by Hans Henrik Happe
Hi,
After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed to go smoothly,
we have experienced missing files and dirs when listing directories.
We are using a distributed setup with 20 bricks (no redundance from
glusterfs).
The dirs and files can be referenced directly, but does not show up in
listings (readdir, i.e. ls). Renaming them works, but they still does
not show up.
The first time we discovered this, we noticed that files slowly
reappeared and finally all were there. After that we started a
fix-layout which is still running (5mio dirs). After this we would
compare brick files to the mounted fs.
Yesterday we again discovered some missing files in a dir. After some
poking around we found that all missing files were located on the same
brick.
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=
0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e3400000000000000
00750000000000000001
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=
0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa800000000000000
00700000000000000001
Anyone who experienced this or have some clues to what might be wrong?
Cheers,
Hans Henrik
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-07-09 09:28:23 UTC
Permalink
Post by Niels de Vos
Post by Nithya Balachandran
Hi Hans,
Another user has reported something similar and we are still debugging
this.
Post by Nithya Balachandran
Would you mind taking a tcpdump of the client while listing the directory
from a FUSE client and sending it to me? Please use
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
Last week I came across a patch that fixes a bug in the FUSE kernel
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=c6cdd51404b7ac12dd95173ddfc548c59ecf037f
This was included in linux-4.14 but has been marked for backporting to
older kernels as well. If the description matches the experienced
behaviour, running with an updates (or patched) kernel may help.
This is interesting but it won't help in this case. The issue is with
Gluster.
Post by Niels de Vos
Niels
Post by Nithya Balachandran
Also, please send the output of gluster volume info and gluster volume
get
Post by Nithya Balachandran
<volname> all.
Thanks,
Nithya
Post by Hans Henrik Happe
Hi,
After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed to go smoothly,
we have experienced missing files and dirs when listing directories.
We are using a distributed setup with 20 bricks (no redundance from
glusterfs).
The dirs and files can be referenced directly, but does not show up in
listings (readdir, i.e. ls). Renaming them works, but they still does
not show up.
The first time we discovered this, we noticed that files slowly
reappeared and finally all were there. After that we started a
fix-layout which is still running (5mio dirs). After this we would
compare brick files to the mounted fs.
Yesterday we again discovered some missing files in a dir. After some
poking around we found that all missing files were located on the same
brick.
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=
0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e3400000000000000
00750000000000000001
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=
0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa800000000000000
00700000000000000001
Anyone who experienced this or have some clues to what might be wrong?
Cheers,
Hans Henrik
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-07-09 09:12:03 UTC
Permalink
Thanks Hans. What are the names of the "missing" files?

Regards,
Nithya
Post by Nithya Balachandran
Hi Hans,
Another user has reported something similar and we are still debugging this.
Would you mind taking a tcpdump of the client while listing the directory
from a FUSE client and sending it to me? Please use
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
Also, please send the output of gluster volume info and gluster volume get
<volname> all.
Thanks,
Nithya
Post by Hans Henrik Happe
Hi,
After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed to go smoothly,
we have experienced missing files and dirs when listing directories.
We are using a distributed setup with 20 bricks (no redundance from
glusterfs).
The dirs and files can be referenced directly, but does not show up in
listings (readdir, i.e. ls). Renaming them works, but they still does
not show up.
The first time we discovered this, we noticed that files slowly
reappeared and finally all were there. After that we started a
fix-layout which is still running (5mio dirs). After this we would
compare brick files to the mounted fs.
Yesterday we again discovered some missing files in a dir. After some
poking around we found that all missing files were located on the same
brick.
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0
.contri.1=0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e340000000000000
000750000000000000001
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0
.contri.1=0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa80000000000000
000700000000000000001
Anyone who experienced this or have some clues to what might be wrong?
Cheers,
Hans Henrik
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-07-09 09:19:49 UTC
Permalink
Or even better, the brick on which those files exist and the gluster volume
status output for the volume.

Thanks,
Nithya
Post by Nithya Balachandran
Thanks Hans. What are the names of the "missing" files?
Regards,
Nithya
Post by Nithya Balachandran
Hi Hans,
Another user has reported something similar and we are still debugging this.
Would you mind taking a tcpdump of the client while listing the directory
from a FUSE client and sending it to me? Please use
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
Also, please send the output of gluster volume info and gluster volume
get <volname> all.
Thanks,
Nithya
Post by Hans Henrik Happe
Hi,
After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed to go smoothly,
we have experienced missing files and dirs when listing directories.
We are using a distributed setup with 20 bricks (no redundance from
glusterfs).
The dirs and files can be referenced directly, but does not show up in
listings (readdir, i.e. ls). Renaming them works, but they still does
not show up.
The first time we discovered this, we noticed that files slowly
reappeared and finally all were there. After that we started a
fix-layout which is still running (5mio dirs). After this we would
compare brick files to the mounted fs.
Yesterday we again discovered some missing files in a dir. After some
poking around we found that all missing files were located on the same
brick.
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0
.contri.1=0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e340000000000000
000750000000000000001
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0
.contri.1=0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa80000000000000
000700000000000000001
Anyone who experienced this or have some clues to what might be wrong?
Cheers,
Hans Henrik
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-07-09 09:25:25 UTC
Permalink
Hi Hans,

Never mind - I found it. It looks like the same problem as reported by the
other user. In both cases, this is a pure distribute volume.

See packet 154.The iatt is null for all entries. It looks like a .glusterfs
gfid link is missing on that brick.

Would you prefer that I send the steps to recover in a private email?

Regards,
Nithya
Post by Nithya Balachandran
Or even better, the brick on which those files exist and the gluster
volume status output for the volume.
Thanks,
Nithya
Post by Nithya Balachandran
Thanks Hans. What are the names of the "missing" files?
Regards,
Nithya
Post by Nithya Balachandran
Hi Hans,
Another user has reported something similar and we are still debugging this.
Would you mind taking a tcpdump of the client while listing the
directory from a FUSE client and sending it to me? Please use
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
Also, please send the output of gluster volume info and gluster volume
get <volname> all.
Thanks,
Nithya
Post by Hans Henrik Happe
Hi,
After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed to go smoothly,
we have experienced missing files and dirs when listing directories.
We are using a distributed setup with 20 bricks (no redundance from
glusterfs).
The dirs and files can be referenced directly, but does not show up in
listings (readdir, i.e. ls). Renaming them works, but they still does
not show up.
The first time we discovered this, we noticed that files slowly
reappeared and finally all were there. After that we started a
fix-layout which is still running (5mio dirs). After this we would
compare brick files to the mounted fs.
Yesterday we again discovered some missing files in a dir. After some
poking around we found that all missing files were located on the same
brick.
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0
.contri.1=0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e340000000000000
000750000000000000001
# getfattr -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0
.contri.1=0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa80000000000000
000700000000000000001
Anyone who experienced this or have some clues to what might be wrong?
Cheers,
Hans Henrik
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Hans Henrik Happe
2018-07-09 11:00:57 UTC
Permalink
Hi Nithya,

Now, we have the same situation as we did the last time. It fixed itself.

Do you have any insights into what might trigger a fix. It must be
related to use of the dirs, but it's almost 24 hours since we started
poking around in that path.

According the the rebalance log the dir has not been touched.

Cheers,
Hans Henrik
Post by Nithya Balachandran
Hi Hans,
Never mind - I found it. It looks like the same problem as reported by
the other user. In both cases, this is a pure distribute volume.
See packet 154.The iatt is null for all entries. It looks like a
.glusterfs gfid link is missing on that brick.
Would you prefer that I send the steps to recover in a private email?
Regards,
Nithya
Or even better, the brick on which those files exist and the gluster
volume status output for the volume.
Thanks,
Nithya
Thanks Hans. What are the names of the "missing" files?
Regards,
Nithya
On 9 July 2018 at 13:30, Nithya Balachandran
Hi Hans,
Another user has reported something similar and we are still
debugging this.
Would you mind taking a tcpdump of the client while listing
the directory from a FUSE client and sending it to me?
Please use 
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
Also, please send the output of gluster volume info and
gluster volume get <volname> all.
Thanks,
Nithya
Hi,
After an upgrade from 3.7 -> 3.10 -> 3.12.9 that seemed
to go smoothly,
we have experienced missing files and dirs when listing
directories.
We are using a distributed setup with 20 bricks (no
redundance from
glusterfs).
The dirs and files can be referenced directly, but does
not show up in
listings (readdir, i.e. ls). Renaming them works, but
they still does
not show up.
The first time we discovered this, we noticed that files
slowly
reappeared and finally all were there. After that we
started a
fix-layout which is still running (5mio dirs). After
this we would
compare brick files to the mounted fs.
Yesterday we again discovered some missing files in a
dir. After some
poking around we found that all missing files were
located on the same
brick.
# getfattr  -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000b2169fa3bf11b4e7
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=0x00000000d65e340000000000000000750000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000d65e340000000000000000750000000000000001
# getfattr  -m . -d -e hex backup
# file: backup
trusted.gfid=0x8613f6e0317141918b42d8c8063ffbce
trusted.glusterfs.dht=0x0000000100000000bf11b4e8cbe5e488
trusted.glusterfs.quota.6e0ab807-6eed-4af1-92b8-0db3ca7a19e0.contri.1=0x00000000b03aa80000000000000000700000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00000000b03aa80000000000000000700000000000000001
Anyone who experienced this or have some clues to what
might be wrong?
Cheers,
Hans Henrik
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
Loading...