Discussion:
[Gluster-users] Bug with hardlink limitation in 3.12.13 ?
Reiner Keller
2018-08-31 11:15:16 UTC
Permalink
Hello,

I got yesterday unexpected error "No space left on device" on my new
gluster volume caused by too many hardlinks.
This happened while I done "rsync --aAHXxv ..." replication from old
gluster to new gluster servers - each running latest version 3.12.13
(for changing volume schema from 2x2 to 3x1 with quorum and a fresh
Debian Stretch setup instead Jessie).

When I deduplicated it around half a year ago with "rdfind" hardlinking
was working fine (I think that was glusterfs around version 3.12.8 -
3.12.10 ?)

My search for documentation found only the parameter
"storage.max-hardlinks" with default of 100 for version 4.0.
I checked it in my gluster 3.12.13 but here the parameter is not yet
implemented.

I tested/proofed it by running my small test on underlaying ext4
filesystem brick directly and on gluster volume using same ext4
filesystem of the brick:

Testline for it:
            mkdir test; cd test; echo "hello" > test; for I in $(seq 1
100); do ln test test-$I ; done

* on ext4 fs (old brick: xfs) I could do 100 hardlinks without problems
(from documentation I found ext has 65.000 hardlinks compiled in )
* on actual GlusterFS (same on my old and new gluster volumes) I could
do only up to 45 hardlinks now

But from deduplication around 6 months ago I could find e.g. a file with
240 hardlinks setup and there is no problem using these referenced files
(caused by multiple languages / multiple uploads per language ,
production/staging system cloned... ).

My actual workaround has to be using duplicated content but it would be
great if this could be fixed in next versions ;)

(Saltstack didn't support yet successful setup of glusterfs 4.0
peers/volumes; something in output of "gluster --xml --mode=script" call
must be weird but I haven't seen any differences so far)

Bests


Reiner
Shyam Ranganathan
2018-08-31 11:59:09 UTC
Permalink
Post by Reiner Keller
Hello,
I got yesterday unexpected error "No space left on device" on my new
gluster volume caused by too many hardlinks.
This happened while I done "rsync --aAHXxv ..." replication from old
gluster to new gluster servers - each running latest version 3.12.13
(for changing volume schema from 2x2 to 3x1 with quorum and a fresh
Debian Stretch setup instead Jessie).
I suspect you have hit this:
https://bugzilla.redhat.com/show_bug.cgi?id=1602262#c5

I further suspect your older setup was 3.10 based and not 3.12 based.

There is an additional feature added in 3.12 that stores GFID to path
conversion details using xattrs (see "GFID to path" in
https://docs.gluster.org/en/latest/release-notes/3.12.0/#major-changes-and-features
)

Due to which xattr storage limit is reached/breached on ext4 based bricks.

To check if you are facing similar issue to the one in the bug provided
above, I would check if the brick logs throw up the no space error on a
gfid2path set failure.

To get around the problem, I would suggest using xfs as the backing FS
for the brick (considering you have close to 250 odd hardlinks to a
file). I would not attempt to disable the gfid2path feature, as that is
useful in getting to the real file just given a GFID and is already part
of core on disk Gluster metadata (It can be shut off, but I would
refrain from it).
Post by Reiner Keller
When I deduplicated it around half a year ago with "rdfind" hardlinking
was working fine (I think that was glusterfs around version 3.12.8 -
3.12.10 ?)
My search for documentation found only the parameter
"storage.max-hardlinks" with default of 100 for version 4.0.
I checked it in my gluster 3.12.13 but here the parameter is not yet
implemented.
I tested/proofed it by running my small test on underlaying ext4
filesystem brick directly and on gluster volume using same ext4
            mkdir test; cd test; echo "hello" > test; for I in $(seq 1
100); do ln test test-$I ; done
* on ext4 fs (old brick: xfs) I could do 100 hardlinks without problems
(from documentation I found ext has 65.000 hardlinks compiled in )
* on actual GlusterFS (same on my old and new gluster volumes) I could
do only up to 45 hardlinks now
But from deduplication around 6 months ago I could find e.g. a file with
240 hardlinks setup and there is no problem using these referenced files
(caused by multiple languages / multiple uploads per language ,
production/staging system cloned... ).
My actual workaround has to be using duplicated content but it would be
great if this could be fixed in next versions ;)
(Saltstack didn't support yet successful setup of glusterfs 4.0
peers/volumes; something in output of "gluster --xml --mode=script" call
must be weird but I haven't seen any differences so far)
Bests
Reiner
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Reiner Keller
2018-08-31 17:06:42 UTC
Permalink
Hello,
Post by Shyam Ranganathan
https://bugzilla.redhat.com/show_bug.cgi?id=1602262#c5
I further suspect your older setup was 3.10 based and not 3.12 based.
There is an additional feature added in 3.12 that stores GFID to path
conversion details using xattrs (see "GFID to path" in
https://docs.gluster.org/en/latest/release-notes/3.12.0/#major-changes-and-features
)
Due to which xattr storage limit is reached/breached on ext4 based bricks.
To check if you are facing similar issue to the one in the bug provided
above, I would check if the brick logs throw up the no space error on a
gfid2path set failure.
thanks for the hint.

From log output (= no gfid2path errors) it seems to be not the problem
although the old
gluster volume was setup with version 3.10.x (or even 3.8.x i think).

I wrote I could reproduce it on new ext4  and on old xfs gluster volumes
with version
3.12.13 while it was running fine with ~ 3.12.8 (half year ago) without
problems.

But just saw that my old main volume wasn't/isn't xfs but also ext4.
Digging into logs I could see that I was running in January still 3.10.8
/ 3.10.9
and initial switched in April to 3.12.9 / 3.12 version branch.

From entry sizes/differences your suggestion would fit:

    https://manpages.debian.org/testing/manpages/xattr.7.en.html or
    http://man7.org/linux/man-pages/man5/attr.5.html

In the current ext2, ext3, and ext4 filesystem implementations, the
total bytes used by the names and values of all of a file's extended
attributes must fit in a single filesystem block (1024, 2048 or 4096
bytes, depending on the block size specified when the filesystem was
created).

because I can see differences by volume setup type:

* with ext4 setup "defaults" i got error after 44 successful links:

/etc/mke2fs.conf:

[defaults]
        base_features =
sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr
        default_mntopts = acl,user_xattr
        enable_periodic_fsck = 0
        blocksize = 4096
        inode_size = 256
        inode_ratio = 16384

[fs_types]
        ext3 = {
                features = has_journal
        }
        ext4 = {
                features =
has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize
                inode_size = 256
        }
...

* with ext4 setup "small" with enhanced settings back to inode_size=256
while I formatted it I could setup only 10 successful links:

        small = {
                blocksize = 1024
                inode_size = 128                       # in my
volume case also 256
                inode_ratio = 4096
        }

which would match the blocksize limitation - here in default ext4 fs:

# attr -l test
Attribute "gfid2path.3951a8fec4234683" has a 41 byte value for test
Attribute "gfid" has a 16 byte value for test
Attribute "afr.dirty" has a 12 byte value for test
Attribute "gfid2path.003214300fcd4d34" has a 44 byte value for test
...
Attribute "gfid2path.fe4d3e4d0bc31351" has a 44 byte value for test
# attr -l test | grep gfid2path | wc -l
46

41 + 16 + 12 + 45 * 44 = 2049 (+ 256 inode_size + ???  )  <= 4096

with 1k blocksize I got only:

# attr -l test
Attribute "gfid2path.7a3f0fa0e8f7eba3" has a 41 byte value for test
Attribute "gfid" has a 16 byte value for test
Attribute "afr.dirty" has a 12 byte value for test
Attribute "gfid2path.13e24c98a492d7f1" has a 43 byte value for test
Attribute "gfid2path.1efa5641f9785d6c" has a 43 byte value for test
Attribute "gfid2path.551dfafc5d4a7bda" has a 43 byte value for test
Attribute "gfid2path.578dc56f20801437" has a 43 byte value for test
Attribute "gfid2path.8e983883502e3c57" has a 43 byte value for test
Attribute "gfid2path.94b700e1c7f156e3" has a 43 byte value for test
Attribute "gfid2path.cbeb1108f9a34dac" has a 43 byte value for test
Attribute "gfid2path.cd6ba60f624abc2b" has a 43 byte value for test
Attribute "gfid2path.dbf95647d59cd047" has a 43 byte value for test
Attribute "gfid2path.ec6198adc227befe" has a 44 byte value for test

* 41 + 16 + 12 + 9 * 43 + 44 = 500 (+256 inode_size + ???) <= 1024

whatever the unknown missing (different) size is needed for.


But in log I can see only this error which is not very helpful (here
tested on another volume with ext4 "default" settings):

[2018-08-31 13:21:11.306022] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-0: remote operation failed: (/test/test-45
-> /test/test-46) [No space left on device]
[2018-08-31 13:21:11.306420] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-2: remote operation failed: (/test/test-45
-> /test/test-46) [No space left on device]
[2018-08-31 13:21:11.306466] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-1: remote operation failed: (/test/test-45
-> /test/test-46) [No space left on device]
[2018-08-31 13:21:11.307452] W [fuse-bridge.c:540:fuse_entry_cbk]
0-glusterfs-fuse: 23122: LINK() /test/test-46 => -1 (No space left
on device)
[2018-08-31 13:21:11.339428] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-0: remote operation failed: (/test/test-45
-> /test/test-47) [No space left on device]
[2018-08-31 13:21:11.339991] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-1: remote operation failed: (/test/test-45
-> /test/test-47) [No space left on device]
[2018-08-31 13:21:11.340039] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-2: remote operation failed: (/test/test-45
-> /test/test-47) [No space left on device]
[2018-08-31 13:21:11.341036] W [fuse-bridge.c:540:fuse_entry_cbk]
0-glusterfs-fuse: 23125: LINK() /test/test-47 => -1 (No space left
on device)
...
[2018-08-31 13:21:12.097966] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-0: remote operation failed: (/test/test-45
-> /test/test-100) [No space left on device]
[2018-08-31 13:21:12.098326] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-1: remote operation failed: (/test/test-45
-> /test/test-100) [No space left on device]
[2018-08-31 13:21:12.098412] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk]
0-staging-prudsys-client-2: remote operation failed: (/test/test-45
-> /test/test-100) [No space left on device]
[2018-08-31 13:21:12.101533] W [fuse-bridge.c:540:fuse_entry_cbk]
0-glusterfs-fuse: 23285: LINK() /test/test-100 => -1 (No space left
on device)
[2018-08-31 13:32:48.613484] I [MSGID: 109063]
[dht-layout.c:716:dht_layout_normalize] 0-staging-prudsys-dht: Found
anomalies in (null) (gfid = 1923da4d-9661-4d53-84d6-7d196276a0fc).
Holes=1 overlaps=0
[2018-08-31 13:32:48.613529] I [MSGID: 109063]
[dht-layout.c:716:dht_layout_normalize] 0-staging-prudsys-dht: Found
anomalies in (null) (gfid = a04f8ab2-5b7a-490c-a3a6-71d9899295fa).
Holes=1 overlaps=0
[2018-08-31 13:32:48.613556] I [MSGID: 109063]
[dht-layout.c:716:dht_layout_normalize] 0-staging-prudsys-dht: Found
anomalies in (null) (gfid = 6d5ed713-7cff-4cf9-bb57-197a217051db).
Holes=1 overlaps=0

Same log output with old ext4 filesystem:

[2018-08-31 14:06:05.882886] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-2:
remote operation failed: (/test/test-45 -> /test/test-46) [No space
left on device]
[2018-08-31 14:06:05.883427] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-3:
remote operation failed: (/test/test-45 -> /test/test-46) [No space
left on device]
[2018-08-31 14:06:05.884821] W [fuse-bridge.c:540:fuse_entry_cbk]
0-glusterfs-fuse: 15575982: LINK() /test/test-46 => -1 (No space
left on device)
[2018-08-31 14:06:05.901852] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-2:
remote operation failed: (/test/test-45 -> /test/test-47) [No space
left on device]
[2018-08-31 14:06:05.902410] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-3:
remote operation failed: (/test/test-45 -> /test/test-47) [No space
left on device]
[2018-08-31 14:06:05.903968] W [fuse-bridge.c:540:fuse_entry_cbk]
0-glusterfs-fuse: 15575985: LINK() /test/test-47 => -1 (No space
left on device)
...
[2018-08-31 14:06:06.727908] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-2:
remote operation failed: (/test/test-45 -> /test/test-100) [No space
left on device]
[2018-08-31 14:06:06.728409] W [MSGID: 114031]
[client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-3:
remote operation failed: (/test/test-45 -> /test/test-100) [No space
left on device]
[2018-08-31 14:06:06.729631] W [fuse-bridge.c:540:fuse_entry_cbk]
0-glusterfs-fuse: 15576145: LINK() /test/test-100 => -1 (No space
left on device)

and no more loglines referencing my test - I can see no gfid2path errors
you mentioned but error seems related to inode size as above shown.

Also interesting as you mentioned: with actual 3.12.13 version on
another "old" Glusterfs volume with xfs background it's working fine.
Post by Shyam Ranganathan
To check if you are facing similar issue to the one in the bug provided
above, I would check if the brick logs throw up the no space error on a
gfid2path set failure.
Is there some parameter to get more detailed error logging ? But from
docu it looks like it has default good settings:

https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/

diagnostics.brick-log-level Changes the log-level of the bricks. INFO
DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE
diagnostics.client-log-level Changes the log-level of the clients.
INFO DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE
diagnostics.latency-measurement Statistics related to the latency of
each operation would be tracked. Off On/Off
diagnostics.dump-fd-stats Statistics related to file-operations would
be tracked. Off On
Post by Shyam Ranganathan
To get around the problem, I would suggest using xfs as the backing FS
for the brick (considering you have close to 250 odd hardlinks to a
file). I would not attempt to disable the gfid2path feature, as that is
useful in getting to the real file just given a GFID and is already part
of core on disk Gluster metadata (It can be shut off, but I would
refrain from it).
Since there are only some 10xGB of small files duplicated like this it's
much easier to use then duplicated content again
and perhaps I can also trigger people to clean up unneeded files.
Post by Shyam Ranganathan
Post by Reiner Keller
My search for documentation found only the parameter
"storage.max-hardlinks" with default of 100 for version 4.0.
I checked it in my gluster 3.12.13 but here the parameter is not yet
implemented.
If this problem is backend filesystem related it would be good to have
it documented also for 4.0 that the storage.max-hardlinks parameter
would work only if the backend is e.g. xfs and has enough inode space
for it (best with a reference/short example howto calculate it) ?


Thanks and nice weekend


Reiner
Shyam Ranganathan
2018-09-10 16:32:46 UTC
Permalink
Post by Reiner Keller
Hello,
Post by Shyam Ranganathan
https://bugzilla.redhat.com/show_bug.cgi?id=1602262#c5
I further suspect your older setup was 3.10 based and not 3.12 based.
There is an additional feature added in 3.12 that stores GFID to path
conversion details using xattrs (see "GFID to path" in
https://docs.gluster.org/en/latest/release-notes/3.12.0/#major-changes-and-features
)
Due to which xattr storage limit is reached/breached on ext4 based bricks.
To check if you are facing similar issue to the one in the bug provided
above, I would check if the brick logs throw up the no space error on a
gfid2path set failure.
thanks for the hint.
From log output (= no gfid2path errors) it seems to be not the problem
although the old
gluster volume was setup with version 3.10.x (or even 3.8.x i think).
I wrote I could reproduce it on new ext4  and on old xfs gluster volumes
with version
3.12.13 while it was running fine with ~ 3.12.8 (half year ago) without
problems.
But just saw that my old main volume wasn't/isn't xfs but also ext4.
Digging into logs I could see that I was running in January still 3.10.8
/ 3.10.9
and initial switched in April to 3.12.9 / 3.12 version branch.
    https://manpages.debian.org/testing/manpages/xattr.7.en.html or
    http://man7.org/linux/man-pages/man5/attr.5.html
In the current ext2, ext3, and ext4 filesystem implementations, the
total bytes used by the names and values of all of a file's extended
attributes must fit in a single filesystem block (1024, 2048 or 4096
bytes, depending on the block size specified when the filesystem was
created).
<huge snip>

So in short, the inode size limits in ext4 impacts the hard link counts
that can be created in Gluster, which is the limitation that you hit,
would that be a correct summary?
Post by Reiner Keller
Post by Shyam Ranganathan
To check if you are facing similar issue to the one in the bug provided
above, I would check if the brick logs throw up the no space error on a
gfid2path set failure.
Is there some parameter to get more detailed error logging ? But from
The error logs posted are from the client (FUSE mount) logs, the log
lines with the gfid2path that I was mentioning is on the bricks.

There is no further logging level that needs to change to see the said
errors as these are warning and above.
Post by Reiner Keller
Post by Shyam Ranganathan
Post by Reiner Keller
My search for documentation found only the parameter
"storage.max-hardlinks" with default of 100 for version 4.0.
I checked it in my gluster 3.12.13 but here the parameter is not yet
implemented.
If this problem is backend filesystem related it would be good to have
it documented also for 4.0 that the storage.max-hardlinks parameter
would work only if the backend is e.g. xfs and has enough inode space
for it (best with a reference/short example howto calculate it) ?
Fair point, raised a github issue around the same here [1]
(contributions welcome :) ).

Regards,
Shyam

[1] Gluster documentation github issue for hardlink and ext4
limitations: https://github.com/gluster/glusterdocs/issues/418

Loading...