[Gluster-users] Finding performance bottlenecks

Discussion:

Tony Hoyle

2018-04-30 12:14:15 UTC

Hi

I'm trying to setup a 3 node gluster, and am hitting huge performance
bottlenecks.

The 3 servers are connected over 10GB and glusterfs is set to create a 3
node replica.

With a single VM performance was poor, but I could have lived with it.

I tried to stress it by putting copies of a bunch of VMs on the servers
and seeing what happened with parallel nodes.. network load never broke
13Mbps and disk load peaked at under 1Mbps. VMs were so slow that
services timed out during boot causing failures.

Checked the network with iperf and it reached 9.7Gb so the hardware is
fine.. it just seems that for some reason glusterfs just isn't using it.

gluster volume top gv0 read-perf shows 0Mbps for all files, although I'm
not sure whether the command is working.

There's probably a magic setting somewhere, but I've been a couple of
days trying to find it now..

Tony

stats:
Block Size: 512b+ 1024b+
2048b+
No. of Reads: 0 2
0
No. of Writes: 40 141
399

Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 173 24
4
No. of Writes: 18351 5049
2478

Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 12 113
0
No. of Writes: 1640 648
200

Block Size: 262144b+ 524288b+
1048576b+
No. of Reads: 0 0
0
No. of Writes: 329 55
139

Block Size: 2097152b+
No. of Reads: 0
No. of Writes: 1
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 41
RELEASE
0.00 0.00 us 0.00 us 0.00 us 6
RELEASEDIR
0.00 3.43 us 2.65 us 4.10 us 6
OPENDIR
0.00 217.85 us 217.85 us 217.85 us 1
SETATTR
0.00 66.38 us 49.47 us 80.57 us 4
SEEK
0.00 394.18 us 394.18 us 394.18 us 1
FTRUNCATE
0.00 116.68 us 29.88 us 186.25 us 16
GETXATTR
0.00 397.32 us 267.18 us 540.38 us 10
XATTROP
0.00 553.09 us 244.97 us 1242.98 us 12
READDIR
0.00 201.60 us 69.61 us 744.71 us 41
OPEN
0.00 734.96 us 75.05 us 37399.38 us 328
READ
0.01 1750.65 us 33.99 us 750562.48 us 591
LOOKUP
0.02 2972.84 us 30.72 us 788018.47 us 496
STATFS
0.03 10951.33 us 35.36 us 695155.13 us 166
STAT
0.42 2574.98 us 208.73 us 1710282.73 us 11877
FXATTROP
2.80 609.20 us 468.51 us 321422.91 us 333946
RCHECKSUM
5.04 548.76 us 14.83 us 76288179.46 us 668188
INODELK
18.46 149940.70 us 13.59 us 79966278.04 us 8949
FINODELK
20.04 395073.91 us 84.99 us 3835355.67 us 3688
FSYNC
53.17 131171.66 us 85.76 us 3838020.34 us 29470
WRITE
0.00 0.00 us 0.00 us 0.00 us 7238
UPCALL
0.00 0.00 us 0.00 us 0.00 us 7238
CI_IATT

Duration: 1655 seconds
Data Read: 8804864 bytes
Data Written: 612756480 bytes

config:
Volume Name: gv0
Type: Replicate
Volume ID: a0b6635a-ae48-491b-834a-08e849e87642
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: barbelith10:/tank/vmdata/gv0
Brick2: rommel10:/tank/vmdata/gv0
Brick3: panzer10:/tank/vmdata/gv0
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.cache-invalidation: on
nfs.disable: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

Thing

2018-05-01 01:27:03 UTC

Permalink

Hi,

So is the KVM or Vmware as the host(s)? I basically have the same setup ie
3 x 1TB "raid1" nodes and VMs, but 1gb networking. I do notice with vmware
using NFS disk was pretty slow (40% of a single disk) but this was over 1gb
networking which was clearly saturating. Hence I am moving to KVM to use
glusterfs hoping for better performance and bonding, it will be interesting
to see which host type runs faster.

Which operating system is gluster on?

Did you do iperf between all nodes?

Post by Tony Hoyle
Hi
I'm trying to setup a 3 node gluster, and am hitting huge performance
bottlenecks.
The 3 servers are connected over 10GB and glusterfs is set to create a 3
node replica.
With a single VM performance was poor, but I could have lived with it.
I tried to stress it by putting copies of a bunch of VMs on the servers
and seeing what happened with parallel nodes.. network load never broke
13Mbps and disk load peaked at under 1Mbps. VMs were so slow that
services timed out during boot causing failures.
Checked the network with iperf and it reached 9.7Gb so the hardware is
fine.. it just seems that for some reason glusterfs just isn't using it.
gluster volume top gv0 read-perf shows 0Mbps for all files, although I'm
not sure whether the command is working.
There's probably a magic setting somewhere, but I've been a couple of
days trying to find it now..
Tony
Block Size: 512b+ 1024b+
2048b+
No. of Reads: 0 2
0
No. of Writes: 40 141
399
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 173 24
4
No. of Writes: 18351 5049
2478
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 12 113
0
No. of Writes: 1640 648
200
Block Size: 262144b+ 524288b+
1048576b+
No. of Reads: 0 0
0
No. of Writes: 329 55
139
Block Size: 2097152b+
No. of Reads: 0
No. of Writes: 1
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 41
RELEASE
0.00 0.00 us 0.00 us 0.00 us 6
RELEASEDIR
0.00 3.43 us 2.65 us 4.10 us 6
OPENDIR
0.00 217.85 us 217.85 us 217.85 us 1
SETATTR
0.00 66.38 us 49.47 us 80.57 us 4
SEEK
0.00 394.18 us 394.18 us 394.18 us 1
FTRUNCATE
0.00 116.68 us 29.88 us 186.25 us 16
GETXATTR
0.00 397.32 us 267.18 us 540.38 us 10
XATTROP
0.00 553.09 us 244.97 us 1242.98 us 12
READDIR
0.00 201.60 us 69.61 us 744.71 us 41
OPEN
0.00 734.96 us 75.05 us 37399.38 us 328
READ
0.01 1750.65 us 33.99 us 750562.48 us 591
LOOKUP
0.02 2972.84 us 30.72 us 788018.47 us 496
STATFS
0.03 10951.33 us 35.36 us 695155.13 us 166
STAT
0.42 2574.98 us 208.73 us 1710282.73 us 11877
FXATTROP
2.80 609.20 us 468.51 us 321422.91 us 333946
RCHECKSUM
5.04 548.76 us 14.83 us 76288179.46 us 668188
INODELK
18.46 149940.70 us 13.59 us 79966278.04 us 8949
FINODELK
20.04 395073.91 us 84.99 us 3835355.67 us 3688
FSYNC
53.17 131171.66 us 85.76 us 3838020.34 us 29470
WRITE
0.00 0.00 us 0.00 us 0.00 us 7238
UPCALL
0.00 0.00 us 0.00 us 0.00 us 7238
CI_IATT
Duration: 1655 seconds
Data Read: 8804864 bytes
Data Written: 612756480 bytes
Volume Name: gv0
Type: Replicate
Volume ID: a0b6635a-ae48-491b-834a-08e849e87642
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: barbelith10:/tank/vmdata/gv0
Brick2: rommel10:/tank/vmdata/gv0
Brick3: panzer10:/tank/vmdata/gv0
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.cache-invalidation: on
nfs.disable: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Darrell Budic

2018-05-01 14:47:14 UTC

Permalink

I see youâre using ZFS,whatâs the pool look like? Did you set compression, relatime, xattr, & acltype? What version of zfs & gluster? What kind of CPUs /memory on the servers and any zfs tuning?

How are you mounting the storage volumes? Are you using jumbo frames? Are the VMs also on these servers, or different hosts? If hosts, how are they connected?

Lots of variables to look at, can you give us more info on your whole setup?

Subject: Re: [Gluster-users] Finding performance bottlenecks
Date: April 30, 2018 at 8:27:03 PM CDT
To: Tony Hoyle
Cc: Gluster Users
Hi,
So is the KVM or Vmware as the host(s)? I basically have the same setup ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking. I do notice with vmware using NFS disk was pretty slow (40% of a single disk) but this was over 1gb networking which was clearly saturating. Hence I am moving to KVM to use glusterfs hoping for better performance and bonding, it will be interesting to see which host type runs faster.
Which operating system is gluster on?
Did you do iperf between all nodes?
Hi
I'm trying to setup a 3 node gluster, and am hitting huge performance
bottlenecks.
The 3 servers are connected over 10GB and glusterfs is set to create a 3
node replica.
With a single VM performance was poor, but I could have lived with it.
I tried to stress it by putting copies of a bunch of VMs on the servers
and seeing what happened with parallel nodes.. network load never broke
13Mbps and disk load peaked at under 1Mbps. VMs were so slow that
services timed out during boot causing failures.
Checked the network with iperf and it reached 9.7Gb so the hardware is
fine.. it just seems that for some reason glusterfs just isn't using it.
gluster volume top gv0 read-perf shows 0Mbps for all files, although I'm
not sure whether the command is working.
There's probably a magic setting somewhere, but I've been a couple of
days trying to find it now..
Tony
Block Size: 512b+ 1024b+
2048b+
No. of Reads: 0 2
0
No. of Writes: 40 141
399
Block Size: 4096b+ 8192b+
16384b+
No. of Reads: 173 24
4
No. of Writes: 18351 5049
2478
Block Size: 32768b+ 65536b+
131072b+
No. of Reads: 12 113
0
No. of Writes: 1640 648
200
Block Size: 262144b+ 524288b+
1048576b+
No. of Reads: 0 0
0
No. of Writes: 329 55
139
Block Size: 2097152b+
No. of Reads: 0
No. of Writes: 1
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 41
RELEASE
0.00 0.00 us 0.00 us 0.00 us 6
RELEASEDIR
0.00 3.43 us 2.65 us 4.10 us 6
OPENDIR
0.00 217.85 us 217.85 us 217.85 us 1
SETATTR
0.00 66.38 us 49.47 us 80.57 us 4
SEEK
0.00 394.18 us 394.18 us 394.18 us 1
FTRUNCATE
0.00 116.68 us 29.88 us 186.25 us 16
GETXATTR
0.00 397.32 us 267.18 us 540.38 us 10
XATTROP
0.00 553.09 us 244.97 us 1242.98 us 12
READDIR
0.00 201.60 us 69.61 us 744.71 us 41
OPEN
0.00 734.96 us 75.05 us 37399.38 us 328
READ
0.01 1750.65 us 33.99 us 750562.48 us 591
LOOKUP
0.02 2972.84 us 30.72 us 788018.47 us 496
STATFS
0.03 10951.33 us 35.36 us 695155.13 us 166
STAT
0.42 2574.98 us 208.73 us 1710282.73 us 11877
FXATTROP
2.80 609.20 us 468.51 us 321422.91 us 333946
RCHECKSUM
5.04 548.76 us 14.83 us 76288179.46 us 668188
INODELK
18.46 149940.70 us 13.59 us 79966278.04 us 8949
FINODELK
20.04 395073.91 us 84.99 us 3835355.67 us 3688
FSYNC
53.17 131171.66 us 85.76 us 3838020.34 us 29470
WRITE
0.00 0.00 us 0.00 us 0.00 us 7238
UPCALL
0.00 0.00 us 0.00 us 0.00 us 7238
CI_IATT
Duration: 1655 seconds
Data Read: 8804864 bytes
Data Written: 612756480 bytes
Volume Name: gv0
Type: Replicate
Volume ID: a0b6635a-ae48-491b-834a-08e849e87642
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: barbelith10:/tank/vmdata/gv0
Brick2: rommel10:/tank/vmdata/gv0
Brick3: panzer10:/tank/vmdata/gv0
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.cache-invalidation: on
nfs.disable: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Tony Hoyle

2018-05-01 09:38:38 UTC

Permalink

Hi,
So is the KVM or Vmware as the host(s)? I basically have the same setup
ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking. I do notice with
vmware using NFS disk was pretty slow (40% of a single disk) but this
was over 1gb networking which was clearly saturating. Hence I am moving
to KVM to use glusterfs hoping for better performance and bonding, it
will be interesting to see which host type runs faster.

1gb will always be the bottleneck in that situation - that's going too
max out at the speed of a single disk or lower. You need at minimum to
bond interfaces and preferably go to 10gb to do that.

Our NFS actually ends up faster than local disk because the read speed
of the raid is faster than the read speed of the local disk.

Which operating system is gluster on?

Debian Linux. Supermicro motherboards, 24 core i7 with 128GB of RAM on
the VM hosts.

Did you do iperf between all nodes?

Vincent Royer

2018-05-03 18:58:03 UTC

Permalink

It worries me how many threads talk about low performance. I'm about to
build out a replica 3 setup and run Ovirt with a bunch of Windows VMs.

Are the issues Tony is experiencing "normal" for Gluster? Does anyone here
have a system with windows VMs and have good performance?

*Vincent Royer*
*778-825-1057*

<http://www.epicenergy.ca/>
*SUSTAINABLE MOBILE ENERGY SOLUTIONS*

Post by Tony Hoyle

Post by Thing
Hi,
So is the KVM or Vmware as the host(s)? I basically have the same setup
ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking. I do notice with
vmware using NFS disk was pretty slow (40% of a single disk) but this
was over 1gb networking which was clearly saturating. Hence I am moving
to KVM to use glusterfs hoping for better performance and bonding, it
will be interesting to see which host type runs faster.

1gb will always be the bottleneck in that situation - that's going too
max out at the speed of a single disk or lower. You need at minimum to
bond interfaces and preferably go to 10gb to do that.
Our NFS actually ends up faster than local disk because the read speed
of the raid is faster than the read speed of the local disk.

Post by Thing
Which operating system is gluster on?

Debian Linux. Supermicro motherboards, 24 core i7 with 128GB of RAM on
the VM hosts.

Post by Thing
Did you do iperf between all nodes?

Darrell Budic

2018-05-03 21:24:53 UTC

Permalink

Tonyâs performance sounds significantly sub par from my experience. I did some testing with gluster 3.12 and Ovirt 3.9, on my running production cluster when I enabled the glfsapi, even my pre numbers are significantly better than what Tony is reporting:

âââââââââââââââââââ
Before using gfapi:

]# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 90.1843 s, 11.9 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.94715 s, 272 MB/s

# hdparm -tT /dev/vda

/dev/vda:
Timing cached reads: 17322 MB in 2.00 seconds = 8673.49 MB/sec
Timing buffered disk reads: 996 MB in 3.00 seconds = 331.97 MB/sec

#bonnie++ -d . -s 8G -n 0 -m pre-glapi -f -b -u root

Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
pre-glapi 8G 196245 30 105331 15 962775 49 1638 34
Latency 1578ms 1383ms 201ms 301ms

Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
pre-glapi 8G 155937 27 102899 14 1030285 54 1763 45
Latency 694ms 1333ms 114ms 229ms

(note, sequential reads seem to have been influenced by caching somewhereâŠ)

After switching to gfapi:

# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 80.8317 s, 13.3 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.3473 s, 321 MB/s

# hdparm -tT /dev/vda

/dev/vda:
Timing cached reads: 17112 MB in 2.00 seconds = 8568.86 MB/sec
Timing buffered disk reads: 1406 MB in 3.01 seconds = 467.70 MB/sec

#bonnie++ -d . -s 8G -n 0 -m glapi -f -b -u root

Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
glapi 8G 359100 59 185289 24 489575 31 2079 67
Latency 160ms 355ms 36041us 185ms

Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
glapi 8G 341307 57 180546 24 472572 35 2655 61
Latency 153ms 394ms 101ms 116ms

So excellent improvement in write throughput, but the significant improvement in latency is what was most noticed by users. Anecdotal reports of 2x+ performance improvements, with one remarking that itâs like having dedicated disks :)

This system is on my production cluster, so itâs not getting exclusive disk access, but this VM is not doing anything else itself. The cluster is 3 xeon E5-2609 v3 @ 1.90GHz servers w/ 64G ram, SATA2 disks; 2 with 9x spindles each, 1 with 8x slightly faster disks (all spinners). Using ZFS stripes with lz4 compression and 10G connectivity to 8 hosts. Running gluster 3.12.3 at the moment. The cluster itself has about 70 running VMs in varying states of switching to gfapi use, but my main sql servers are using their own volumes and not competing for this one. These have not yet had the spectre/meltdown patches applied.

This will be skewed because I forced it to not steal all the ram on the server (reads will certainly be cached), but an idea of what it can do disk wise, on the volume used above:
# bonnie++ -d . -s 8G -n 0 -m zfs-server -f -b -u root -r 4096
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfs-server 8G 604940 79 510410 87 1393862 99 3164 91
Latency 99545us 100ms 247us 152ms

Just for fun from one of the servers showing base load and this testing:

ââââââââââââââââââââââââââ

Subject: Re: [Gluster-users] Finding performance bottlenecks
Date: May 3, 2018 at 1:58:03 PM CDT
It worries me how many threads talk about low performance. I'm about to build out a replica 3 setup and run Ovirt with a bunch of Windows VMs.
Are the issues Tony is experiencing "normal" for Gluster? Does anyone here have a system with windows VMs and have good performance?
Vincent Royer
778-825-1057
<http://www.epicenergy.ca/>
SUSTAINABLE MOBILE ENERGY SOLUTIONS

1gb will always be the bottleneck in that situation - that's going too
max out at the speed of a single disk or lower. You need at minimum to
bond interfaces and preferably go to 10gb to do that.
Our NFS actually ends up faster than local disk because the read speed
of the raid is faster than the read speed of the local disk.

Post by Thing
Which operating system is gluster on?

Debian Linux. Supermicro motherboards, 24 core i7 with 128GB of RAM on
the VM hosts.

Post by Thing
Did you do iperf between all nodes?

Yes, around 9.7Gb/s
It doesn't appear to be raw read speed but iowait. Under nfs load with
multiple VMs I get an iowait of around 0.3%. Under gluster, never less
than 10% and glusterfsd is often the top of the CPU usage. This causes
a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
esp. Windows ones - one machine I set booting and it was still booting
30 minutes later!
Tony
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Ben Turner

2018-05-07 14:55:38 UTC

Permalink

----- Original Message -----

Sent: Thursday, May 3, 2018 5:24:53 PM
Subject: Re: [Gluster-users] Finding performance bottlenecks
Tony’s performance sounds significantly sub par from my experience. I did
some testing with gluster 3.12 and Ovirt 3.9, on my running production
cluster when I enabled the glfsapi, even my pre numbers are significantly
———————————————————
]# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 90.1843 s, 11.9 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.94715 s, 272 MB/s

This is no where near what I would expect. With VMs I am able to saturate a 10G interface if I run enough IOs from enough VMs and use LVM striping(8 files / PVs) inside the VMs. So thats 1200 MB of aggregate throughput and each VM will do 200-300+ MB / sec writes, 300-400+ reads.

I have seen this issue before though, once it was resolved by an upgrade of oVIRT another time I fixed the alignment of the RAID / LVM / XFS stack. There is one instance I haven't yet figured out yet :/ I want to build on a fresh HW stack. Make sure you have everything aligned in the storage stack, writeback cache on the RAID controller, jumbo frames, the gluster VM group set, and a random IO tuned profile. If you want to tinker with LVM striping inside the VM I have had success with that as well.

Also note:

Using urandom will significantly lower perf, it is dependent on how fast your CPU can create random data. Try /dev/zero or FIO / IOzone / smallfile - https://github.com/bengland2/smallfile, that will eliminate CPU as a bottleneck.

Also remember VMs are a heavy random IO workload, you need IOPs on your disks to see good perf. Also, since gluster doesn't have a MD server those IOs are moved to xattrs on teh files themselves. This is a bit of a double edged sword as these take IOPs as well and if the backend is not properly aligned this can double or triple the IOPs overhead on these small reads and writes that gluster uses to in place of a MD server.

HTH

-b

# hdparm -tT /dev/vda
Timing cached reads: 17322 MB in 2.00 seconds = 8673.49 MB/sec
Timing buffered disk reads: 996 MB in 3.00 seconds = 331.97 MB/sec
# bonnie++ -d . -s 8G -n 0 -m pre-glapi -f -b -u root
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
pre-glapi 8G 196245 30 105331 15 962775 49 1638 34
Latency 1578ms 1383ms 201ms 301ms
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
pre-glapi 8G 155937 27 102899 14 1030285 54 1763 45
Latency 694ms 1333ms 114ms 229ms
(note, sequential reads seem to have been influenced by caching somewhere…)
# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 80.8317 s, 13.3 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.3473 s, 321 MB/s
# hdparm -tT /dev/vda
Timing cached reads: 17112 MB in 2.00 seconds = 8568.86 MB/sec
Timing buffered disk reads: 1406 MB in 3.01 seconds = 467.70 MB/sec
#bonnie++ -d . -s 8G -n 0 -m glapi -f -b -u root
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
glapi 8G 359100 59 185289 24 489575 31 2079 67
Latency 160ms 355ms 36041us 185ms
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
glapi 8G 341307 57 180546 24 472572 35 2655 61
Latency 153ms 394ms 101ms 116ms
So excellent improvement in write throughput, but the significant improvement
in latency is what was most noticed by users. Anecdotal reports of 2x+
performance improvements, with one remarking that it’s like having dedicated
disks :)
This system is on my production cluster, so it’s not getting exclusive disk
access, but this VM is not doing anything else itself. The cluster is 3 xeon
each, 1 with 8x slightly faster disks (all spinners). Using ZFS stripes with
lz4 compression and 10G connectivity to 8 hosts. Running gluster 3.12.3 at
the moment. The cluster itself has about 70 running VMs in varying states of
switching to gfapi use, but my main sql servers are using their own volumes
and not competing for this one. These have not yet had the spectre/meltdown
patches applied.
This will be skewed because I forced it to not steal all the ram on the
server (reads will certainly be cached), but an idea of what it can do disk
# bonnie++ -d . -s 8G -n 0 -m zfs-server -f -b -u root -r 4096
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfs-server 8G 604940 79 510410 87 1393862 99 3164 91
Latency 99545us 100ms 247us 152ms
——————————————————————————
Subject: Re: [Gluster-users] Finding performance bottlenecks
Date: May 3, 2018 at 1:58:03 PM CDT
It worries me how many threads talk about low performance. I'm about to build
out a replica 3 setup and run Ovirt with a bunch of Windows VMs.
Are the issues Tony is experiencing "normal" for Gluster? Does anyone here
have a system with windows VMs and have good performance?
Vincent Royer
778-825-1057
SUSTAINABLE MOBILE ENERGY SOLUTIONS

Hi,
So is the KVM or Vmware as the host(s)? I basically have the same setup
ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking. I do notice with
vmware using NFS disk was pretty slow (40% of a single disk) but this
was over 1gb networking which was clearly saturating. Hence I am moving
to KVM to use glusterfs hoping for better performance and bonding, it
will be interesting to see which host type runs faster.

1gb will always be the bottleneck in that situation - that's going too
max out at the speed of a single disk or lower. You need at minimum to
bond interfaces and preferably go to 10gb to do that.
Our NFS actually ends up faster than local disk because the read speed
of the raid is faster than the read speed of the local disk.

Which operating system is gluster on?

Debian Linux. Supermicro motherboards, 24 core i7 with 128GB of RAM on
the VM hosts.

Did you do iperf between all nodes?

Yes, around 9.7Gb/s
It doesn't appear to be raw read speed but iowait. Under nfs load with
multiple VMs I get an iowait of around 0.3%. Under gluster, never less
than 10% and glusterfsd is often the top of the CPU usage. This causes
a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
esp. Windows ones - one machine I set booting and it was still booting
30 minutes later!
Tony
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Continue reading on narkive:

Search results for '[Gluster-users] Finding performance bottlenecks' (Questions and Answers)

replies

CPU Bottleneck help please?

started 2009-08-04 04:19:09 UTC

desktops

replies

Is my CPU bottlenecking my GPU?

started 2013-10-13 20:21:05 UTC

hardware

replies

how do you understand bottlenecks?