Anastasia Belyaeva
2018-04-11 19:03:46 UTC
Hello everybody!
I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are actually
virtual machines located on 3 separate physical XenServer7.1 servers)
They are all connected via infiniband network. Iperf3 shows around *23
Gbit/s network bandwidth *between each 2 of them.
Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with logical
volume created on top of it, formatted with *xfs*. Gluster top reports the
'replica 2' volume is for testing purpose only.
block sizes. For example:
*4K block size*
[replica 3 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s*
[replica 2 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s*
*512K block size*
[replica 3 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s*
[replica 2 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s*
With bigger block size It's still not where I expect it to be, but at least
it starts to make some sense.
I've been trying to solve this for a very long time with no luck.
I've already tried both kernel tuning (different 'tuned' profiles and the
ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster
volume options, including
write-behind/flush-behind/write-behind-window-size.
The latter, to my surprise, didn't make any difference. 'Cause at first I
thought it was the buffering issue but it turns out it does buffer writes,
just not very efficient (well at least what it looks like in the *gluster
profile output*)
help, suggestions or pointers are highly appreciated!
I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are actually
virtual machines located on 3 separate physical XenServer7.1 servers)
They are all connected via infiniband network. Iperf3 shows around *23
Gbit/s network bandwidth *between each 2 of them.
Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with logical
volume created on top of it, formatted with *xfs*. Gluster top reports the
list-cnt 0
Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput *631.82 MBps *time 3.3989 secs
Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput *566.96 MBps *time 3.7877 secs
Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput *546.65 MBps *time 3.9285 secs
list-cnt 0
Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
Throughput *539.60 MBps *time 3.9798 secs
Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
Throughput *580.07 MBps *time 3.7021 secs
And two *pure replicated ('replica 2' and 'replica 3')* volumes. *TheBrick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput *631.82 MBps *time 3.3989 secs
Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput *566.96 MBps *time 3.7877 secs
Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput *546.65 MBps *time 3.9285 secs
list-cnt 0
Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
Throughput *539.60 MBps *time 3.9798 secs
Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
Throughput *580.07 MBps *time 3.7021 secs
'replica 2' volume is for testing purpose only.
Volume Name: r2vol
Type: Replicate
Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
nfs.disable: on
Volume Name: r3vol
Type: Replicate
Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
nfs.disable: on
*Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount*Type: Replicate
Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
nfs.disable: on
Volume Name: r3vol
Type: Replicate
Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
nfs.disable: on
gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
*The problem *is that there is a significant performance loss with smaller(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
block sizes. For example:
*4K block size*
[replica 3 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s*
[replica 2 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s*
*512K block size*
[replica 3 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s*
[replica 2 volume]
***@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s*
With bigger block size It's still not where I expect it to be, but at least
it starts to make some sense.
I've been trying to solve this for a very long time with no luck.
I've already tried both kernel tuning (different 'tuned' profiles and the
ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster
volume options, including
write-behind/flush-behind/write-behind-window-size.
The latter, to my surprise, didn't make any difference. 'Cause at first I
thought it was the buffering issue but it turns out it does buffer writes,
just not very efficient (well at least what it looks like in the *gluster
profile output*)
...
Cleared stats.
of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s
Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 18.00 us 18.00 us 18.00 us 1
STATFS
0.00 20.50 us 11.00 us 30.00 us 2
FLUSH
0.00 22.50 us 17.00 us 28.00 us 2
FINODELK
0.01 76.50 us 65.00 us 88.00 us 2
FXATTROP
0.01 177.00 us 177.00 us 177.00 us 1
CREATE
0.02 56.14 us 23.00 us 128.00 us 7
LOOKUP
0.02 259.00 us 20.00 us 498.00 us 2
ENTRYLK
99.94 59.23 us 17.00 us 10914.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 18.00 us 18.00 us 18.00 us 1
STATFS
0.00 20.50 us 11.00 us 30.00 us 2
FLUSH
0.00 22.50 us 17.00 us 28.00 us 2
FINODELK
0.01 76.50 us 65.00 us 88.00 us 2
FXATTROP
0.01 177.00 us 177.00 us 177.00 us 1
CREATE
0.02 56.14 us 23.00 us 128.00 us 7
LOOKUP
0.02 259.00 us 20.00 us 498.00 us 2
ENTRYLK
99.94 59.23 us 17.00 us 10914.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 33.00 us 33.00 us 33.00 us 1
STATFS
0.00 22.50 us 13.00 us 32.00 us 2
ENTRYLK
0.00 32.00 us 26.00 us 38.00 us 2
FLUSH
0.01 47.50 us 16.00 us 79.00 us 2
FINODELK
0.01 157.00 us 157.00 us 157.00 us 1
CREATE
0.01 92.00 us 70.00 us 114.00 us 2
FXATTROP
0.03 72.57 us 39.00 us 121.00 us 7
LOOKUP
99.94 47.97 us 15.00 us 1598.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 33.00 us 33.00 us 33.00 us 1
STATFS
0.00 22.50 us 13.00 us 32.00 us 2
ENTRYLK
0.00 32.00 us 26.00 us 38.00 us 2
FLUSH
0.01 47.50 us 16.00 us 79.00 us 2
FINODELK
0.01 157.00 us 157.00 us 157.00 us 1
CREATE
0.01 92.00 us 70.00 us 114.00 us 2
FXATTROP
0.03 72.57 us 39.00 us 121.00 us 7
LOOKUP
99.94 47.97 us 15.00 us 1598.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 58.00 us 58.00 us 58.00 us 1
STATFS
0.00 38.00 us 38.00 us 38.00 us 2
ENTRYLK
0.01 59.00 us 32.00 us 86.00 us 2
FLUSH
0.01 81.00 us 33.00 us 129.00 us 2
FINODELK
0.01 91.50 us 73.00 us 110.00 us 2
FXATTROP
0.01 239.00 us 239.00 us 239.00 us 1
CREATE
0.04 103.14 us 63.00 us 210.00 us 7
LOOKUP
99.92 52.99 us 16.00 us 11289.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 58.00 us 58.00 us 58.00 us 1
STATFS
0.00 38.00 us 38.00 us 38.00 us 2
ENTRYLK
0.01 59.00 us 32.00 us 86.00 us 2
FLUSH
0.01 81.00 us 33.00 us 129.00 us 2
FINODELK
0.01 91.50 us 73.00 us 110.00 us 2
FXATTROP
0.01 239.00 us 239.00 us 239.00 us 1
CREATE
0.04 103.14 us 63.00 us 210.00 us 7
LOOKUP
99.92 52.99 us 16.00 us 11289.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
At this point I'm officially run out of idea where to look next. So anyCleared stats.
of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s
Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 18.00 us 18.00 us 18.00 us 1
STATFS
0.00 20.50 us 11.00 us 30.00 us 2
FLUSH
0.00 22.50 us 17.00 us 28.00 us 2
FINODELK
0.01 76.50 us 65.00 us 88.00 us 2
FXATTROP
0.01 177.00 us 177.00 us 177.00 us 1
CREATE
0.02 56.14 us 23.00 us 128.00 us 7
LOOKUP
0.02 259.00 us 20.00 us 498.00 us 2
ENTRYLK
99.94 59.23 us 17.00 us 10914.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 18.00 us 18.00 us 18.00 us 1
STATFS
0.00 20.50 us 11.00 us 30.00 us 2
FLUSH
0.00 22.50 us 17.00 us 28.00 us 2
FINODELK
0.01 76.50 us 65.00 us 88.00 us 2
FXATTROP
0.01 177.00 us 177.00 us 177.00 us 1
CREATE
0.02 56.14 us 23.00 us 128.00 us 7
LOOKUP
0.02 259.00 us 20.00 us 498.00 us 2
ENTRYLK
99.94 59.23 us 17.00 us 10914.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 33.00 us 33.00 us 33.00 us 1
STATFS
0.00 22.50 us 13.00 us 32.00 us 2
ENTRYLK
0.00 32.00 us 26.00 us 38.00 us 2
FLUSH
0.01 47.50 us 16.00 us 79.00 us 2
FINODELK
0.01 157.00 us 157.00 us 157.00 us 1
CREATE
0.01 92.00 us 70.00 us 114.00 us 2
FXATTROP
0.03 72.57 us 39.00 us 121.00 us 7
LOOKUP
99.94 47.97 us 15.00 us 1598.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 33.00 us 33.00 us 33.00 us 1
STATFS
0.00 22.50 us 13.00 us 32.00 us 2
ENTRYLK
0.00 32.00 us 26.00 us 38.00 us 2
FLUSH
0.01 47.50 us 16.00 us 79.00 us 2
FINODELK
0.01 157.00 us 157.00 us 157.00 us 1
CREATE
0.01 92.00 us 70.00 us 114.00 us 2
FXATTROP
0.03 72.57 us 39.00 us 121.00 us 7
LOOKUP
99.94 47.97 us 15.00 us 1598.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 58.00 us 58.00 us 58.00 us 1
STATFS
0.00 38.00 us 38.00 us 38.00 us 2
ENTRYLK
0.01 59.00 us 32.00 us 86.00 us 2
FLUSH
0.01 81.00 us 33.00 us 129.00 us 2
FINODELK
0.01 91.50 us 73.00 us 110.00 us 2
FXATTROP
0.01 239.00 us 239.00 us 239.00 us 1
CREATE
0.04 103.14 us 63.00 us 210.00 us 7
LOOKUP
99.92 52.99 us 16.00 us 11289.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 0 0
0
No. of Writes: 1576 4173
19605
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 0 0
0
No. of Writes: 7777 1847
657
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 1
RELEASE
0.00 58.00 us 58.00 us 58.00 us 1
STATFS
0.00 38.00 us 38.00 us 38.00 us 2
ENTRYLK
0.01 59.00 us 32.00 us 86.00 us 2
FLUSH
0.01 81.00 us 33.00 us 129.00 us 2
FINODELK
0.01 91.50 us 73.00 us 110.00 us 2
FXATTROP
0.01 239.00 us 239.00 us 239.00 us 1
CREATE
0.04 103.14 us 63.00 us 210.00 us 7
LOOKUP
99.92 52.99 us 16.00 us 11289.00 us 35635
WRITE
Duration: 38 seconds
Data Read: 0 bytes
Data Written: 1073741824 bytes
help, suggestions or pointers are highly appreciated!
--
Best regards,
Anastasia Belyaeva
Best regards,
Anastasia Belyaeva