Discussion:
Shard file size (gluster 3.7.5)
(too old to reply)
Lindsay Mathieson
2015-11-01 00:59:44 UTC
Permalink
Have upgraded my cluster to debian jessie, so able to natively test 3.7.5

I’ve noticed some peculiarities with reported file sizes on the gluster mount but I seem to recall this is a known issue with shards?

Source file is sparse, nominal size 64GB, real size 25GB. However underlying storage is ZFS with lz4 compression which reduces it to 16GB

No Shard:
ls –lh : 64 GB
du –h : 25 GB

4MB Shard:
ls –lh : 144 GB
du –h : 21 MB

512MB Shard:
ls –lh : 72 GB
du –h : 765 MB


a du –sh of the .shard directory show 16GB for all datastores

Is this a known bug for sharding? Will it be repaired eventually?

Sent from Mail for Windows 10
Lindsay Mathieson
2015-11-01 10:24:49 UTC
Permalink
More concerningly, the files copied to the gluster shard datastores have a difference md5sum to the original. Thats a fatal flaw.

Sent from Mail for Windows 10



From: Lindsay Mathieson
Sent: Sunday, 1 November 2015 10:59 AM
To: gluster-users
Subject: Shard file size (gluster 3.7.5)


Have upgraded my cluster to debian jessie, so able to natively test 3.7.5

I’ve noticed some peculiarities with reported file sizes on the gluster mount but I seem to recall this is a known issue with shards?

Source file is sparse, nominal size 64GB, real size 25GB. However underlying storage is ZFS with lz4 compression which reduces it to 16GB

No Shard:
ls –lh     : 64 GB
du –h      : 25 GB

4MB Shard:
ls –lh     : 144 GB
du –h      : 21 MB

512MB Shard:
ls –lh     : 72 GB
du –h      : 765 MB


a du –sh of the .shard directory show 16GB for all datastores

Is this a known bug for sharding? Will it be repaired eventually?

Sent from Mail for Windows 10
Krutika Dhananjay
2015-11-02 08:49:22 UTC
Permalink
Could you share
(1) the output of 'getfattr -d -m . -e hex <path>' where <path> represents the path to the original file from the brick where it resides
(2) the size of the file as seen from the mount point around the time when (1) is taken
(3) output of 'gluster volume info'

-Krutika

----- Original Message -----
Sent: Sunday, November 1, 2015 6:29:44 AM
Subject: [Gluster-users] Shard file size (gluster 3.7.5)
Have upgraded my cluster to debian jessie, so able to natively test 3.7.5
I’ve noticed some peculiarities with reported file sizes on the gluster mount
but I seem to recall this is a known issue with shards?
Source file is sparse, nominal size 64GB, real size 25GB. However underlying
storage is ZFS with lz4 compression which reduces it to 16GB
ls –lh : 64 GB
du –h : 25 GB
ls –lh : 144 GB
du –h : 21 MB
ls –lh : 72 GB
du –h : 765 MB
a du –sh of the .shard directory show 16GB for all datastores
Is this a known bug for sharding? Will it be repaired eventually?
Sent from Mail for Windows 10
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Lindsay Mathieson
2015-11-02 11:57:51 UTC
Permalink
Post by Krutika Dhananjay
Could you share
(1) the output of 'getfattr -d -m . -e hex <path>' where <path> represents
the path to the original file from the brick where it resides
(2) the size of the file as seen from the mount point around the time
when (1) is taken
(3) output of 'gluster volume info'
Hope this helps

(1)
getfattr -d -m . -e hex /zfs_vm/datastore3/images/301/vm-301-disk-2.qcow2
getfattr: Removing leading '/' from absolute path names
# file: zfs_vm/datastore3/images/301/vm-301-disk-2.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000563732fc000ee087
trusted.gfid=0x25621772b50340ab87e22c7e5e36bf00
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000001711c6000000000000000000000000000000f60d570000000000000000

(2) the size of the file as seen from the mount point around the time when
(1) is taken

# Bytes
cd /mnt/pve/gluster3/images/301
ls -l
total 8062636
-rw-r--r-- 1 root root 99082436608 Nov 2 21:48 vm-301-disk-2.qcow2

# Human Readable :)
ls -lh ls -l /mnt/pve/gluster3/images/301/vm-301-disk-2.qcow2
total 7.7G
-rw-r--r-- 1 root root 93G Nov 2 21:48 vm-301-disk-2.qcow2


# Orignal file it was rsync copied from
ls -l /mnt/pve/pxsphere/images/301/vm-301-disk-1.qcow2
-rw-r--r-- 1 root root 27746172928 Oct 29 18:09
/mnt/pve/pxsphere/images/301/vm-301-disk-1.qcow2

(3) output of 'gluster volume info'
# Volume Info

gluster volume info datastore3

Volume Name: datastore3
Type: Replicate
Volume ID: def21ef7-37b5-4f44-a2cd-8e722fc40b24
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vna.proxmox.softlog:/zfs_vm/datastore3
Brick2: vnb.proxmox.softlog:/glusterdata/datastore3
Brick3: vng.proxmox.softlog:/glusterdata/datastore3
Options Reconfigured:
performance.io-thread-count: 32
performance.write-behind-window-size: 128MB
performance.cache-size: 1GB
performance.cache-refresh-timeout: 4
nfs.disable: on
nfs.addr-namelookup: off
nfs.enable-ino32: on
performance.write-behind: on
cluster.self-heal-window-size: 256
server.event-threads: 4
client.event-threads: 4
cluster.quorum-type: auto
features.shard-block-size: 512MB
features.shard: on
performance.readdir-ahead: on
cluster.server-quorum-ratio: 51%
***@vna:/mnt/pve/gluster3/images/301#
--
Lindsay
Lindsay Mathieson
2015-11-03 01:56:41 UTC
Permalink
I can reproduce this 100% reliably, just by coping files onto a gluster
volume. Reported File size is always larger, sometimes radically so. If I
copy the file again, the reported file is different each time.

using cmp I found that the file contents match, up to the size of the
original file.

MD5SUMS probably differ because of the different file sizes.
Post by Krutika Dhananjay
Could you share
(1) the output of 'getfattr -d -m . -e hex <path>' where <path> represents
the path to the original file from the brick where it resides
(2) the size of the file as seen from the mount point around the time
when (1) is taken
(3) output of 'gluster volume info'
-Krutika
------------------------------
*Sent: *Sunday, November 1, 2015 6:29:44 AM
*Subject: *[Gluster-users] Shard file size (gluster 3.7.5)
Have upgraded my cluster to debian jessie, so able to natively test 3.7.5
I’ve noticed some peculiarities with reported file sizes on the gluster
mount but I seem to recall this is a known issue with shards?
Source file is sparse, nominal size 64GB, real size 25GB. However
underlying storage is ZFS with lz4 compression which reduces it to 16GB
ls –lh : 64 GB
du –h : 25 GB
ls –lh : 144 GB
du –h : 21 MB
ls –lh : 72 GB
du –h : 765 MB
a du –sh of the .shard directory show 16GB for all datastores
Is this a known bug for sharding? Will it be repaired eventually?
Sent from Mail <http://go.microsoft.com/fwlink/?LinkId=550986> for
Windows 10
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Lindsay
Krutika Dhananjay
2015-11-03 04:22:59 UTC
Permalink
Could you try this again with performance.strict-write-ordering set to 'off'?

# gluster volume set <VOL> performance.strict-write-ordering off

-Krutika

----- Original Message -----
Sent: Tuesday, November 3, 2015 7:26:41 AM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
I can reproduce this 100% reliably, just by coping files onto a gluster
volume. Reported File size is always larger, sometimes radically so. If I
copy the file again, the reported file is different each time.
using cmp I found that the file contents match, up to the size of the
original file.
MD5SUMS probably differ because of the different file sizes.
Post by Krutika Dhananjay
Could you share
(1) the output of 'getfattr -d -m . -e hex <path>' where <path> represents
the path to the original file from the brick where it resides
(2) the size of the file as seen from the mount point around the time when (1) is taken
(3) output of 'gluster volume info'
-Krutika
Sent: Sunday, November 1, 2015 6:29:44 AM
Subject: [Gluster-users] Shard file size (gluster 3.7.5)
Have upgraded my cluster to debian jessie, so able to natively test 3.7.5
I’ve noticed some peculiarities with reported file sizes on the gluster mount
but I seem to recall this is a known issue with shards?
Source file is sparse, nominal size 64GB, real size 25GB. However underlying
storage is ZFS with lz4 compression which reduces it to 16GB
ls –lh : 64 GB
du –h : 25 GB
ls –lh : 144 GB
du –h : 21 MB
ls –lh : 72 GB
du –h : 765 MB
a du –sh of the .shard directory show 16GB for all datastores
Is this a known bug for sharding? Will it be repaired eventually?
Sent from Mail for Windows 10
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Lindsay
Krutika Dhananjay
2015-11-03 04:28:39 UTC
Permalink
Correction. The option needs to be enabled and not disabled.

# gluster volume set <VOL> performance.strict-write-ordering on

-Krutika

----- Original Message -----
Sent: Tuesday, November 3, 2015 9:52:59 AM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Could you try this again with performance.strict-write-ordering set to 'off'?
# gluster volume set <VOL> performance.strict-write-ordering off
-Krutika
----- Original Message -----
Sent: Tuesday, November 3, 2015 7:26:41 AM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
I can reproduce this 100% reliably, just by coping files onto a gluster
volume. Reported File size is always larger, sometimes radically so. If I
copy the file again, the reported file is different each time.
using cmp I found that the file contents match, up to the size of the
original file.
MD5SUMS probably differ because of the different file sizes.
Post by Krutika Dhananjay
Could you share
(1) the output of 'getfattr -d -m . -e hex <path>' where <path> represents
the path to the original file from the brick where it resides
(2) the size of the file as seen from the mount point around the time
when
(1) is taken
(3) output of 'gluster volume info'
-Krutika
Sent: Sunday, November 1, 2015 6:29:44 AM
Subject: [Gluster-users] Shard file size (gluster 3.7.5)
Have upgraded my cluster to debian jessie, so able to natively test 3.7.5
I’ve noticed some peculiarities with reported file sizes on the gluster
mount
but I seem to recall this is a known issue with shards?
Source file is sparse, nominal size 64GB, real size 25GB. However underlying
storage is ZFS with lz4 compression which reduces it to 16GB
ls –lh : 64 GB
du –h : 25 GB
ls –lh : 144 GB
du –h : 21 MB
ls –lh : 72 GB
du –h : 765 MB
a du –sh of the .shard directory show 16GB for all datastores
Is this a known bug for sharding? Will it be repaired eventually?
Sent from Mail for Windows 10
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Lindsay
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Lindsay Mathieson
2015-11-03 04:36:29 UTC
Permalink
Post by Krutika Dhananjay
Correction. The option needs to be enabled and not disabled.
# gluster volume set <VOL> performance.strict-write-ordering on
Good timing! I'd just started to test it :)
--
Lindsay
Lindsay Mathieson
2015-11-03 04:51:45 UTC
Permalink
Post by Krutika Dhananjay
Correction. The option needs to be enabled and not disabled.
# gluster volume set <VOL> performance.strict-write-ordering on
That seems to have made the difference, exact match in size bytes to the
src file. I'll do a md5sum as well
--
Lindsay
Lindsay Mathieson
2015-11-03 04:57:25 UTC
Permalink
md5sum matches too:

Volume Name: datastore3
Type: Replicate
Volume ID: def21ef7-37b5-4f44-a2cd-8e722fc40b24
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vna.proxmox.softlog:/zfs_vm/datastore3
Brick2: vnb.proxmox.softlog:/glusterdata/datastore3
Brick3: vng.proxmox.softlog:/glusterdata/datastore3
Options Reconfigured:
performance.strict-write-ordering: on
features.shard-block-size: 256MB
features.shard: on
cluster.quorum-type: auto
performance.readdir-ahead: off
performance.cache-size: 1GB
performance.write-behind: on
performance.write-behind-window-size: 128MB
performance.io-thread-count: 32
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: off
performance.cache-refresh-timeout: 4
cluster.server-quorum-ratio: 51%
Post by Lindsay Mathieson
Post by Krutika Dhananjay
Correction. The option needs to be enabled and not disabled.
# gluster volume set <VOL> performance.strict-write-ordering on
That seems to have made the difference, exact match in size bytes to the
src file. I'll do a md5sum as well
--
Lindsay
--
Lindsay
Lindsay Mathieson
2015-11-03 05:02:30 UTC
Permalink
Post by Krutika Dhananjay
Correction. The option needs to be enabled and not disabled.
# gluster volume set <VOL> performance.strict-write-ordering on
Disk Usage is till out though:

du -h vm-301-disk-1.qcow2
302M vm-301-disk-1.qcow2

Should be 16GB
--
Lindsay
Lindsay Mathieson
2015-11-03 05:07:55 UTC
Permalink
And when I migrated the running VM to another server that image was
immediately corrupted again.

When I looked at the mount, the file size was reported as 256MB - should
have 25GB
Post by Lindsay Mathieson
Post by Krutika Dhananjay
Correction. The option needs to be enabled and not disabled.
# gluster volume set <VOL> performance.strict-write-ordering on
du -h vm-301-disk-1.qcow2
302M vm-301-disk-1.qcow2
Should be 16GB
--
Lindsay
--
Lindsay
Lindsay Mathieson
2015-11-03 05:10:05 UTC
Permalink
Post by Lindsay Mathieson
And when I migrated the running VM to another server that image was
immediately corrupted again.
When I looked at the mount, the file size was reported as 256MB - should
have 25GB
NB - the .shard directory still had what looked like the correct amount of
data (25GB)
--
Lindsay
Lindsay Mathieson
2015-11-03 05:11:44 UTC
Permalink
Post by Lindsay Mathieson
NB - the .shard directory still had what looked like the correct amount of
data (25GB)
Sorry for the serial posting ... but when I deleted the file via its mount,
the data was left behind in the .shard directory. Perhaps the link between
the source file and the shards is getting broken?
--
Lindsay
Krutika Dhananjay
2015-11-03 10:52:27 UTC
Permalink
Was this the same file which had incorrect size? If it was, then it is quite likely.

-Krutika
----- Original Message -----
Sent: Tuesday, November 3, 2015 10:41:44 AM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Post by Lindsay Mathieson
NB - the .shard directory still had what looked like the correct amount of
data (25GB)
Sorry for the serial posting ... but when I deleted the file via its mount,
the data was left behind in the .shard directory. Perhaps the link between
the source file and the shards is getting broken?
--
Lindsay
Krutika Dhananjay
2015-11-03 11:06:04 UTC
Permalink
OK. Could you share the xattr values of this image file?
# getfattr -d -m . -e hex <path-to-the-file-in-the-backend>

-Krutika
----- Original Message -----
Sent: Tuesday, November 3, 2015 10:32:30 AM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Post by Krutika Dhananjay
Correction. The option needs to be enabled and not disabled.
# gluster volume set <VOL> performance.strict-write-ordering on
du -h vm-301-disk-1.qcow2
302M vm-301-disk-1.qcow2
Should be 16GB
--
Lindsay
Lindsay Mathieson
2015-11-03 11:07:58 UTC
Permalink
Post by Krutika Dhananjay
OK. Could you share the xattr values of this image file?
# getfattr -d -m . -e hex <path-to-the-file-in-the-backend>
Can do, but will taker me half an hour or so to recreate the circumstances.

And in answer to your earlier question, the file was the vm image which got
corrupted when I live migrated its VM from one node to another.
--
Lindsay
Lindsay Mathieson
2015-11-03 11:46:52 UTC
Permalink
Post by Krutika Dhananjay
OK. Could you share the xattr values of this image file?
# getfattr -d -m . -e hex <path-to-the-file-in-the-backend>
gluster volume set datastore3 performance.strict-write-ordering on

# rsync file to gluster mount

# Size matches src file
ls -l vm-301-disk-1.qcow2
-rw-r--r-- 1 root root 27746172928 Nov 3 21:40 vm-301-disk-1.qcow2

# Disk usage size is way to small, should be around 16GB
du vm-301-disk-1.qcow2
390730 vm-301-disk-1.qcow2

# Attributes of backend file
getfattr -d -m . -e hex
/glusterdata/datastore3/images/301/vm-301-disk-1.qcow2
getfattr: Removing leading '/' from absolute path names
# file: glusterdata/datastore3/images/301/vm-301-disk-1.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000056386f9400013a25
trusted.gfid=0xc633865ef6ca4010a47cfc80d6c35331
trusted.glusterfs.shard.block-size=0x0000000010000000
trusted.glusterfs.shard.file-size=0x0000000675cd0000000000000000000000000000000bec940000000000000000
--
Lindsay
Krutika Dhananjay
2015-11-04 10:45:58 UTC
Permalink
The block count in the xattr doesn't amount to 16GB of used space.

Is this consistently reproducible? If it is, then could you share the steps? That would help me recreate this in-house and debug it.

-Krutika

----- Original Message -----
Sent: Tuesday, November 3, 2015 5:16:52 PM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Post by Krutika Dhananjay
OK. Could you share the xattr values of this image file?
# getfattr -d -m . -e hex <path-to-the-file-in-the-backend>
gluster volume set datastore3 performance.strict-write-ordering on
# rsync file to gluster mount
# Size matches src file
ls -l vm-301-disk-1.qcow2
-rw-r--r-- 1 root root 27746172928 Nov 3 21:40 vm-301-disk-1.qcow2
# Disk usage size is way to small, should be around 16GB
du vm-301-disk-1.qcow2
390730 vm-301-disk-1.qcow2
# Attributes of backend file
getfattr -d -m . -e hex
/glusterdata/datastore3/images/301/vm-301-disk-1.qcow2
getfattr: Removing leading '/' from absolute path names
# file: glusterdata/datastore3/images/301/vm-301-disk-1.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000056386f9400013a25
trusted.gfid=0xc633865ef6ca4010a47cfc80d6c35331
trusted.glusterfs.shard.block-size=0x0000000010000000
trusted.glusterfs.shard.file-size=0x0000000675cd0000000000000000000000000000000bec940000000000000000
--
Lindsay
Lindsay Mathieson
2015-11-04 12:37:10 UTC
Permalink
Post by Krutika Dhananjay
The block count in the xattr doesn't amount to 16GB of used space.
Is this consistently reproducible? If it is, then could you share the
steps? That would help me recreate this in-house and debug it.
100% of the time for me, and all I have to do is copy or create a file on
the gluster mount.

My bricks are all sitting on ZFS filesystems with compression enabled,
maube that is confusing things? I'll try a test with compression off.

In the mean time here are the steps and results for a from scract volume I
created (datastore3) with just one file.

***@vnb:/mnt/pve/gluster3# gluster volume info

Volume Name: datastore3
Type: Replicate
Volume ID: 96acb55b-b3c2-4940-b642-221dd1b88617
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vna.proxmox.softlog:/zfs_vm/datastore3
Brick2: vnb.proxmox.softlog:/glusterdata/datastore3
Brick3: vng.proxmox.softlog:/glusterdata/datastore3
Options Reconfigured:
performance.io-thread-count: 32
performance.write-behind-window-size: 128MB
performance.cache-size: 1GB
performance.cache-refresh-timeout: 4
nfs.disable: on
nfs.addr-namelookup: off
nfs.enable-ino32: on
performance.write-behind: on
cluster.self-heal-window-size: 256
server.event-threads: 4
client.event-threads: 4
cluster.quorum-type: auto
features.shard-block-size: 512MB
features.shard: on
performance.readdir-ahead: on
cluster.server-quorum-ratio: 51%

***@vnb:/mnt/pve/gluster3# dd if=/dev/sda of=test.bin bs=1MB count=8192
8192+0 records in
8192+0 records out
8192000000 bytes (8.2 GB) copied, 79.5335 s, 1

ls -l
total 289925
drwxr-xr-x 2 root root 2 Nov 4 22:24 images
-rw-r--r-- 1 root root 72357920896 Nov 4 22:26 test.bin

ls -lh
total 284M
drwxr-xr-x 2 root root 2 Nov 4 22:24 images
-rw-r--r-- 1 root root 68G Nov 4 22:26 test.bin

du test.bin
289924 test.bin

du /glusterdata/datastore3/.shard/
2231508 /glusterdata/datastore3/.shard/


getfattr -d -m . -e hex /glusterdata/datastore3/test.bin
getfattr: Removing leading '/' from absolute path names
# file: glusterdata/datastore3/test.bin
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005639f915000f2b76
trusted.gfid=0xa1ecf4c8ab0a4ecc8bd8d4f3affe0bfb
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x00000010d8de40800000000000000000000000000008d9080000000000000000
--
Lindsay
Lindsay Mathieson
2015-11-04 12:52:19 UTC
Permalink
Post by Lindsay Mathieson
My bricks are all sitting on ZFS filesystems with compression enabled,
maube that is confusing things? I'll try a test with compression off.
Nope, same issue with compression off
--
Lindsay
Krutika Dhananjay
2015-11-04 15:09:53 UTC
Permalink
Ah! It's the same issue. Just saw your volume info output. Enabling strict-write-ordering should ensure both size and disk usage are accurate.

-Krutika

----- Original Message -----
Sent: Wednesday, November 4, 2015 6:07:10 PM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Post by Krutika Dhananjay
The block count in the xattr doesn't amount to 16GB of used space.
Is this consistently reproducible? If it is, then could you share the
steps?
That would help me recreate this in-house and debug it.
100% of the time for me, and all I have to do is copy or create a file on the
gluster mount.
My bricks are all sitting on ZFS filesystems with compression enabled, maube
that is confusing things? I'll try a test with compression off.
In the mean time here are the steps and results for a from scract volume I
created (datastore3) with just one file.
Volume Name: datastore3
Type: Replicate
Volume ID: 96acb55b-b3c2-4940-b642-221dd1b88617
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: vna.proxmox.softlog:/zfs_vm/datastore3
Brick2: vnb.proxmox.softlog:/glusterdata/datastore3
Brick3: vng.proxmox.softlog:/glusterdata/datastore3
performance.io-thread-count: 32
performance.write-behind-window-size: 128MB
performance.cache-size: 1GB
performance.cache-refresh-timeout: 4
nfs.disable: on
nfs.addr-namelookup: off
nfs.enable-ino32: on
performance.write-behind: on
cluster.self-heal-window-size: 256
server.event-threads: 4
client.event-threads: 4
cluster.quorum-type: auto
features.shard-block-size: 512MB
features.shard: on
performance.readdir-ahead: on
cluster.server-quorum-ratio: 51%
8192+0 records in
8192+0 records out
8192000000 bytes (8.2 GB) copied, 79.5335 s, 1
ls -l
total 289925
drwxr-xr-x 2 root root 2 Nov 4 22:24 images
-rw-r--r-- 1 root root 72357920896 Nov 4 22:26 test.bin
ls -lh
total 284M
drwxr-xr-x 2 root root 2 Nov 4 22:24 images
-rw-r--r-- 1 root root 68G Nov 4 22:26 test.bin
du test.bin
289924 test.bin
du /glusterdata/datastore3/.shard/
2231508 /glusterdata/datastore3/.shard/
getfattr -d -m . -e hex /glusterdata/datastore3/test.bin
getfattr: Removing leading '/' from absolute path names
# file: glusterdata/datastore3/test.bin
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005639f915000f2b76
trusted.gfid=0xa1ecf4c8ab0a4ecc8bd8d4f3affe0bfb
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x00000010d8de40800000000000000000000000000008d9080000000000000000
--
Lindsay
Lindsay Mathieson
2015-11-04 21:34:51 UTC
Permalink
Post by Krutika Dhananjay
Ah! It's the same issue. Just saw your volume info output. Enabling
strict-write-ordering should ensure both size and disk usage are accurate.
Tested it - nope :( Size s accurate (27746172928 bytes), but disk usage is
wildly inaccurate (698787).

I have compression disabled on the underlying storage now.
--
Lindsay
Krutika Dhananjay
2015-11-05 11:19:34 UTC
Permalink
OK. I am not sure what it is that we're doing differently. I tried the steps you shared and here's what I got:

[***@dhcp35-215 bricks]# gluster volume info

Volume Name: rep
Type: Replicate
Volume ID: 3fd45a4b-0d02-4a44-b74a-41592d48e102
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: kdhananjay:/bricks/1
Brick2: kdhananjay:/bricks/2
Brick3: kdhananjay:/bricks/3
Options Reconfigured:
performance.strict-write-ordering: on
features.shard: on
features.shard-block-size: 512MB
cluster.quorum-type: auto
client.event-threads: 4
server.event-threads: 4
cluster.self-heal-window-size: 256
performance.write-behind: on
nfs.enable-ino32: on
nfs.addr-namelookup: off
nfs.disable: on
performance.cache-refresh-timeout: 4
performance.cache-size: 1GB
performance.write-behind-window-size: 128MB
performance.io-thread-count: 32
performance.readdir-ahead: on

[***@dhcp35-215 mnt]# gluster volume set rep strict-write-ordering on
volume set: success
[***@dhcp35-215 mnt]# dd if=/dev/sda of=test.bin bs=1MB count=8192
8192+0 records in
8192+0 records out
8192000000 bytes (8.2 GB) copied, 133.754 s, 61.2 MB/s
[***@dhcp35-215 mnt]# ls -l
total 8000000
-rw-r--r--. 1 root root 8192000000 Nov 5 16:40 test.bin
[***@dhcp35-215 mnt]# ls -lh
total 7.7G
-rw-r--r--. 1 root root 7.7G Nov 5 16:40 test.bin
[***@dhcp35-215 mnt]# du test.bin
8000000 test.bin

[***@dhcp35-215 bricks]# du /bricks/1/.shard/
7475780 /bricks/1/.shard/
[***@dhcp35-215 bricks]# du /bricks/1/
.glusterfs/ .shard/ test.bin .trashcan/
[***@dhcp35-215 bricks]# du /bricks/1/test.bin
524292 /bricks/1/test.bin

Just to be sure, did you rerun the test on the already broken file (test.bin) which was written to when strict-write-ordering had been off?
Or did you try the new test with strict-write-ordering on a brand new file?

-Krutika

----- Original Message -----
Sent: Thursday, November 5, 2015 3:04:51 AM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Post by Krutika Dhananjay
Ah! It's the same issue. Just saw your volume info output. Enabling
strict-write-ordering should ensure both size and disk usage are accurate.
Tested it - nope :( Size s accurate (27746172928 bytes), but disk usage is
wildly inaccurate (698787).
I have compression disabled on the underlying storage now.
--
Lindsay
Lindsay Mathieson
2015-11-05 13:46:06 UTC
Permalink
Post by Krutika Dhananjay
Just to be sure, did you rerun the test on the already broken file
(test.bin) which was written to when strict-write-ordering had been off?
Or did you try the new test with strict-write-ordering on a brand new file?
Very strange. I tried it on new files and even went to the extent of
deleting the datastore and bricks, then recreating.

One oddity on my system - I have two prefs that I cannot reset
- cluster.server-quorum-ratio
- performance.readdir-ahead

Though I would'nt have thought they made a difference. I might try cleaning
gluster and all its config files off the systems and *really* starting from
scratch.
--
Lindsay
Krutika Dhananjay
2015-11-06 07:22:44 UTC
Permalink
Sure. So far I've just been able to figure that GlusterFS counts blocks in multiples of 512B while XFS seems to count them in multiples of 4.0KB.
Let me again try creating sparse files on xfs, sharded and non-sharded gluster volumes and compare the results. I'll let you know what I find.

-Krutika
----- Original Message -----
Sent: Thursday, November 5, 2015 7:16:06 PM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Post by Krutika Dhananjay
Just to be sure, did you rerun the test on the already broken file
(test.bin)
which was written to when strict-write-ordering had been off?
Or did you try the new test with strict-write-ordering on a brand new file?
Very strange. I tried it on new files and even went to the extent of deleting
the datastore and bricks, then recreating.
One oddity on my system - I have two prefs that I cannot reset
- cluster.server-quorum-ratio
- performance.readdir-ahead
Though I would'nt have thought they made a difference. I might try cleaning
gluster and all its config files off the systems and *really* starting from
scratch.
--
Lindsay
Lindsay Mathieson
2015-11-06 07:33:53 UTC
Permalink
Post by Krutika Dhananjay
Sure. So far I've just been able to figure that GlusterFS counts blocks in
multiples of 512B while XFS seems to count them in multiples of 4.0KB.
Let me again try creating sparse files on xfs, sharded and non-sharded
gluster volumes and compare the results. I'll let you know what I find.
Yes, that could complicate things. ZFS has available record size up to 1MB
max. I'll try the same tests with XFS & EXT4 as well.
--
Lindsay
Lindsay Mathieson
2015-11-06 14:28:42 UTC
Permalink
Post by Krutika Dhananjay
Sure. So far I've just been able to figure that GlusterFS counts blocks in
multiples of 512B while XFS seems to count them in multiples of 4.0KB.
Let me again try creating sparse files on xfs, sharded and non-sharded
gluster volumes and compare the results. I'll let you know what I find.
I repeated the tests with a single gluster brick on a ext4 partition - disk
usage (du) and file size were exactly right.
--
Lindsay
Krutika Dhananjay
2015-11-09 11:49:45 UTC
Permalink
You are right!

So this is what I ran on the volume with bricks in ext4 partition: dd if=/dev/urandom of=file bs=1024 seek=3072 count=2048 conv=notrunc

with shard-block-size being 4M. As you can see, the command creates a sparse 5M file with holes in first 3M.
This means the first block file will mostly be sparse, with the second block file (the one holding the last 1M of data) looking like a normal file.

[***@calvin mnt]# dd if=/dev/urandom of=file bs=1024 seek=3072 count=2048 conv=notrunc
2048+0 records in
2048+0 records out
2097152 bytes (2.1 MB) copied, 0.875319 s, 2.4 MB/s
[***@calvin mnt]# du file
2052 file

... while on the volume with xfs bricks, the number reads 3012.
I added trace logs to see what his happening in the latter case. The posix translator in gluster seems to return more blocks than actually written to if the file is sparse (in this case for the first block file).
And for the second file, it is returning blocks as 1 block per 512bytes of data written. And sharding relies on the values returned by posix translator to keep an account of the size and block count.

I will need some more time to understand why this is so. I will let you know soon as I've figured it out.

Thanks for the report.

-Krutika

----- Original Message -----
Sent: Friday, November 6, 2015 7:58:42 PM
Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)
Post by Krutika Dhananjay
Sure. So far I've just been able to figure that GlusterFS counts blocks in
multiples of 512B while XFS seems to count them in multiples of 4.0KB.
Let me again try creating sparse files on xfs, sharded and non-sharded
gluster volumes and compare the results. I'll let you know what I find.
I repeated the tests with a single gluster brick on a ext4 partition - disk
usage (du) and file size were exactly right.
--
Lindsay
Continue reading on narkive:
Loading...