Discussion:
Very slow performance on Sharded GlusterFS
Add Reply
g***@gencgiyen.com
2017-06-30 10:58:43 UTC
Reply
Permalink
Raw Message
Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed - Striped - Replicated (2 x 2)

10GbE Speed between nodes



"dd" performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster:
250-300mb/s



Now same test without Stripe but with sharding. This results are same when I
set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are "xfs" and 4TB
each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate
files that bigger than brick size.



Thanks,

Gencer.
g***@gencgiyen.com
2017-06-30 10:53:02 UTC
Reply
Permalink
Raw Message
Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed - Striped - Replicated (2 x 2)

10GbE Speed between nodes



"dd" performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster:
250-300mb/s



Now same test without Stripe but with sharding. This results are same when I
set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are "xfs" and 4TB
each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate
files that bigger than brick size.



Thanks,

Gencer.
Krutika Dhananjay
2017-06-30 11:50:04 UTC
Reply
Permalink
Raw Message
Could you please provide the volume-info output?

-Krutika
Post by g***@gencgiyen.com
Hi,
I have an 2 nodes with 20 bricks in total (10+10).
2 Nodes with Distributed – Striped – Replicated (2 x 2)
10GbE Speed between nodes
“dd” performance: 400mb/s and higher
250-300mb/s
Now same test without Stripe but with sharding. This results are same when
I set shard size 4MB or 32MB. (Again 2x Replica here)
Dd performance: 70mb/s
Download directly to the gluster performance : 60mb/s
Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.
I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.
I tried tuning (cache, window-size etc..). Nothing helps.
GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
4TB each.
Is there any tweak/tuning out there to make it fast?
Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.
The reason behind I use shard instead of stripe is i would like to
eleminate files that bigger than brick size.
Thanks,
Gencer.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
g***@gencgiyen.com
2017-06-30 12:03:14 UTC
Reply
Permalink
Raw Message
Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com
Cc: gluster-user <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
Krutika Dhananjay
2017-06-30 12:49:58 UTC
Reply
Permalink
Raw Message
Just noticed that the way you have configured your brick order during
volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to
512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the
tests (separate)?

Here's what you do:
1. Start profile before starting your test - it could be dd or it could be
file download.
# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and
redirect its output to a tmp file.

4. Stop profile
# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary
location in step 3.

-Krutika
Post by g***@gencgiyen.com
Hi Krutika,
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
Status: Started
Snapshot Count: 0
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Brick1: sr-09-loc-50-14-18:/bricks/brick1
Brick2: sr-09-loc-50-14-18:/bricks/brick2
Brick3: sr-09-loc-50-14-18:/bricks/brick3
Brick4: sr-09-loc-50-14-18:/bricks/brick4
Brick5: sr-09-loc-50-14-18:/bricks/brick5
Brick6: sr-09-loc-50-14-18:/bricks/brick6
Brick7: sr-09-loc-50-14-18:/bricks/brick7
Brick8: sr-09-loc-50-14-18:/bricks/brick8
Brick9: sr-09-loc-50-14-18:/bricks/brick9
Brick10: sr-09-loc-50-14-18:/bricks/brick10
Brick11: sr-10-loc-50-14-18:/bricks/brick1
Brick12: sr-10-loc-50-14-18:/bricks/brick2
Brick13: sr-10-loc-50-14-18:/bricks/brick3
Brick14: sr-10-loc-50-14-18:/bricks/brick4
Brick15: sr-10-loc-50-14-18:/bricks/brick5
Brick16: sr-10-loc-50-14-18:/bricks/brick6
Brick17: sr-10-loc-50-14-18:/bricks/brick7
Brick18: sr-10-loc-50-14-18:/bricks/brick8
Brick19: sr-10-loc-50-14-18:/bricks/brick9
Brick20: sr-10-loc-50-14-18:/bricks/brick10
features.shard-block-size: 32MB
features.shard: on
transport.address-family: inet
nfs.disable: on
-Gencer.
*Sent:* Friday, June 30, 2017 2:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Could you please provide the volume-info output?
-Krutika
Hi,
I have an 2 nodes with 20 bricks in total (10+10).
2 Nodes with Distributed – Striped – Replicated (2 x 2)
10GbE Speed between nodes
“dd” performance: 400mb/s and higher
Downloading a large file from internet and directly to the gluster: 250-300mb/s
Now same test without Stripe but with sharding. This results are same when
I set shard size 4MB or 32MB. (Again 2x Replica here)
Dd performance: 70mb/s
Download directly to the gluster performance : 60mb/s
Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.
I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.
I tried tuning (cache, window-size etc..). Nothing helps.
GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
4TB each.
Is there any tweak/tuning out there to make it fast?
Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.
The reason behind I use shard instead of stripe is i would like to
eleminate files that bigger than brick size.
Thanks,
Gencer.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
g***@gencgiyen.com
2017-06-30 13:50:47 UTC
Reply
Permalink
Raw Message
I already tried 512MB but re-try again now and results are the same. Both without tuning;



Stripe 2 replica 2: dd performs 250~ mb/s but shard gives 77mb.



I attached two logs (shard and stripe logs)



Note: I also noticed that you said “order”. Do you mean when we create via volume set we have to make an order for bricks? I thought gluster handles (and do the math) itself.



Gencer



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Friday, June 30, 2017 3:50 PM
To: ***@gencgiyen.com
Cc: gluster-user <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
Gandalf Corvotempesta
2017-06-30 17:19:01 UTC
Reply
Permalink
Raw Message
Il 30 giu 2017 3:51 PM, <***@gencgiyen.com> ha scritto:

Note: I also noticed that you said “order”. Do you mean when we create via
volume set we have to make an order for bricks? I thought gluster handles
(and do the math) itself.

Yes, you have to specify the exact order
Gluster is not flexible in this way and doesn't help you at all.
g***@gencgiyen.com
2017-07-01 06:49:23 UTC
Reply
Permalink
Raw Message
I did the changes (one brick from 09th server and one replica from 10th server and continued with this order) and re-test. Nothing changed. Still slow. (exactly same result.)



-Gencer.



From: Gandalf Corvotempesta [mailto:***@gmail.com]
Sent: Friday, June 30, 2017 8:19 PM
To: ***@gencgiyen.com
Cc: Krutika Dhananjay <***@redhat.com>; gluster-user <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Il 30 giu 2017 3:51 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > ha scritto:

Note: I also noticed that you said “order”. Do you mean when we create via volume set we have to make an order for bricks? I thought gluster handles (and do the math) itself.

Yes, you have to specify the exact order

Gluster is not flexible in this way and doesn't help you at all.
g***@gencgiyen.com
2017-07-03 08:47:46 UTC
Reply
Permalink
Raw Message
Hi,



I want to give an update for this. I also tested READ speed. It seems, sharded volume has a lower read speed than striped volume.



This machine has 24 cores with 64GB of RAM . I really don’t think its caused due to low system. Stripe is kind of a shard but a fixed size based on stripe value / filesize. Hence, I would expect at least the same speed or maybe little slower. What I get is 5-10x slower.



I play with Gluster’s threads count, performance tweaks, caches and etc. Nothing helped. In fact, the performance tweaks that I apply for stripe makes sharded volume much much more worse. Default values are better on sharded volume.



-Gencer.



From: ***@gencgiyen.com [mailto:***@gencgiyen.com]
Sent: Saturday, July 1, 2017 9:49 AM
To: 'Gandalf Corvotempesta' <***@gmail.com>
Cc: 'Krutika Dhananjay' <***@redhat.com>; 'gluster-user' <gluster-***@gluster.org>
Subject: RE: [Gluster-users] Very slow performance on Sharded GlusterFS



I did the changes (one brick from 09th server and one replica from 10th server and continued with this order) and re-test. Nothing changed. Still slow. (exactly same result.)



-Gencer.



From: Gandalf Corvotempesta [mailto:***@gmail.com]
Sent: Friday, June 30, 2017 8:19 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: Krutika Dhananjay <***@redhat.com <mailto:***@redhat.com> >; gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Il 30 giu 2017 3:51 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > ha scritto:

Note: I also noticed that you said “order”. Do you mean when we create via volume set we have to make an order for bricks? I thought gluster handles (and do the math) itself.

Yes, you have to specify the exact order

Gluster is not flexible in this way and doesn't help you at all.
g***@gencgiyen.com
2017-07-03 15:12:05 UTC
Reply
Permalink
Raw Message
Hi Krutika,



Have you be able to look out my profiles? Do you have any clue, idea or suggestion?



Thanks,

-Gencer



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Friday, June 30, 2017 3:50 PM
To: ***@gencgiyen.com
Cc: gluster-user <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
Krutika Dhananjay
2017-07-04 06:39:20 UTC
Reply
Permalink
Raw Message
Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
seems to have witnessed all the IO. No other bricks witnessed any write
operations. This is unacceptable for a volume that has 8 other replica
sets. Why didn't the shards get distributed across all of these sets?

2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
brick 5 is spending 99% of its time in FINODELK fop, when the fop that
should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these
numbers - one distributed-replicated-striped and another
distributed-replicated-sharded? Or are you merely converting a single
volume from one type to another?

* And if there are indeed two volumes, could you share both their `volume
info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from
the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?

-Krutika
Post by g***@gencgiyen.com
Hi Krutika,
Have you be able to look out my profiles? Do you have any clue, idea or suggestion?
Thanks,
-Gencer
*Sent:* Friday, June 30, 2017 3:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Just noticed that the way you have configured your brick order during
volume-create makes both replicas of every set reside on the same machine.
That apart, do you see any difference if you change shard-block-size to
512MB? Could you try that?
If it doesn't help, could you share the volume-profile output for both the
tests (separate)?
1. Start profile before starting your test - it could be dd or it could be file download.
# gluster volume profile <VOL> start
2. Run your test - again either dd or file-download.
3. Once the test has completed, run `gluster volume profile <VOL> info`
and redirect its output to a tmp file.
4. Stop profile
# gluster volume profile <VOL> stop
And attach the volume-profile output file that you saved at a temporary location in step 3.
-Krutika
Hi Krutika,
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
Status: Started
Snapshot Count: 0
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Brick1: sr-09-loc-50-14-18:/bricks/brick1
Brick2: sr-09-loc-50-14-18:/bricks/brick2
Brick3: sr-09-loc-50-14-18:/bricks/brick3
Brick4: sr-09-loc-50-14-18:/bricks/brick4
Brick5: sr-09-loc-50-14-18:/bricks/brick5
Brick6: sr-09-loc-50-14-18:/bricks/brick6
Brick7: sr-09-loc-50-14-18:/bricks/brick7
Brick8: sr-09-loc-50-14-18:/bricks/brick8
Brick9: sr-09-loc-50-14-18:/bricks/brick9
Brick10: sr-09-loc-50-14-18:/bricks/brick10
Brick11: sr-10-loc-50-14-18:/bricks/brick1
Brick12: sr-10-loc-50-14-18:/bricks/brick2
Brick13: sr-10-loc-50-14-18:/bricks/brick3
Brick14: sr-10-loc-50-14-18:/bricks/brick4
Brick15: sr-10-loc-50-14-18:/bricks/brick5
Brick16: sr-10-loc-50-14-18:/bricks/brick6
Brick17: sr-10-loc-50-14-18:/bricks/brick7
Brick18: sr-10-loc-50-14-18:/bricks/brick8
Brick19: sr-10-loc-50-14-18:/bricks/brick9
Brick20: sr-10-loc-50-14-18:/bricks/brick10
features.shard-block-size: 32MB
features.shard: on
transport.address-family: inet
nfs.disable: on
-Gencer.
*Sent:* Friday, June 30, 2017 2:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Could you please provide the volume-info output?
-Krutika
Hi,
I have an 2 nodes with 20 bricks in total (10+10).
2 Nodes with Distributed – Striped – Replicated (2 x 2)
10GbE Speed between nodes
“dd” performance: 400mb/s and higher
Downloading a large file from internet and directly to the gluster: 250-300mb/s
Now same test without Stripe but with sharding. This results are same when
I set shard size 4MB or 32MB. (Again 2x Replica here)
Dd performance: 70mb/s
Download directly to the gluster performance : 60mb/s
Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.
I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.
I tried tuning (cache, window-size etc..). Nothing helps.
GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
4TB each.
Is there any tweak/tuning out there to make it fast?
Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.
The reason behind I use shard instead of stripe is i would like to
eleminate files that bigger than brick size.
Thanks,
Gencer.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
g***@gencgiyen.com
2017-07-04 07:25:54 UTC
Reply
Permalink
Raw Message
Hi Krutika,



Thank you so much for myour reply. Let me answer all:



1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.



And others;



No. I use only one volume. When I tested sharded and striped volumes, I manually stopped volume, deleted volume, purged data (data inside of bricks/disks) and re-create by using this command:



sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10 force



and of course after that volume start executed. If shard enabled, I enable that feature BEFORE I start the sharded volume than mount.



I tried converting from one to another but then I saw documentation says clean voluje should be better. So I tried clean method. Still same performance.



Testfile grows from 1GB to 5GB. And tests are dd. See this example:



dd if=/dev/zero of=/mnt/testfile bs=1G count=5

5+0 records in

5+0 records out

5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)





And this example have generated a profile which I also attached to this e-mail.



Is there anything that I can try? I am open to all kind of suggestions.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Tuesday, July 4, 2017 9:39 AM
To: ***@gencgiyen.com
Cc: gluster-user <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?



2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these numbers - one distributed-replicated-striped and another distributed-replicated-sharded? Or are you merely converting a single volume from one type to another?



* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?



-Krutika





On Mon, Jul 3, 2017 at 8:42 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Have you be able to look out my profiles? Do you have any clue, idea or suggestion?



Thanks,

-Gencer



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 3:50 PM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
Krutika Dhananjay
2017-07-04 15:33:18 UTC
Reply
Permalink
Raw Message
Thanks. I think reusing the same volume was the cause of lack of IO
distribution.
The latest profile output looks much more realistic and in line with i
would expect.

Let me analyse the numbers a bit and get back.

-Krutika
Post by g***@gencgiyen.com
Hi Krutika,
1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.
And others;
No. I use only one volume. When I tested sharded and striped volumes, I
manually stopped volume, deleted volume, purged data (data inside of
sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1
sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2
sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3
sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4
sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5
sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6
sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7
sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8
sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9
sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10
sr-10-loc-50-14-18:/bricks/brick10 force
and of course after that volume start executed. If shard enabled, I enable
that feature BEFORE I start the sharded volume than mount.
I tried converting from one to another but then I saw documentation says
clean voluje should be better. So I tried clean method. Still same
performance.
dd if=/dev/zero of=/mnt/testfile bs=1G count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)
And this example have generated a profile which I also attached to this e-mail.
Is there anything that I can try? I am open to all kind of suggestions.
Thanks,
Gencer.
*Sent:* Tuesday, July 4, 2017 9:39 AM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Hi Gencer,
I just checked the volume-profile attachments.
1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
seems to have witnessed all the IO. No other bricks witnessed any write
operations. This is unacceptable for a volume that has 8 other replica
sets. Why didn't the shards get distributed across all of these sets?
2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
brick 5 is spending 99% of its time in FINODELK fop, when the fop that
should have dominated its profile should have been in fact WRITE.
Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather
these numbers - one distributed-replicated-striped and another
distributed-replicated-sharded? Or are you merely converting a single
volume from one type to another?
* And if there are indeed two volumes, could you share both their `volume
info` outputs to eliminate any confusion?
* If there's just one volume, are you taking care to remove all data from
the mount point of this volume before converting it?
* What is the size the test file grew to?
* These attached profiles are against dd runs? Or the file download test?
-Krutika
Hi Krutika,
Have you be able to look out my profiles? Do you have any clue, idea or suggestion?
Thanks,
-Gencer
*Sent:* Friday, June 30, 2017 3:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Just noticed that the way you have configured your brick order during
volume-create makes both replicas of every set reside on the same machine.
That apart, do you see any difference if you change shard-block-size to
512MB? Could you try that?
If it doesn't help, could you share the volume-profile output for both the
tests (separate)?
1. Start profile before starting your test - it could be dd or it could be file download.
# gluster volume profile <VOL> start
2. Run your test - again either dd or file-download.
3. Once the test has completed, run `gluster volume profile <VOL> info`
and redirect its output to a tmp file.
4. Stop profile
# gluster volume profile <VOL> stop
And attach the volume-profile output file that you saved at a temporary location in step 3.
-Krutika
Hi Krutika,
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
Status: Started
Snapshot Count: 0
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Brick1: sr-09-loc-50-14-18:/bricks/brick1
Brick2: sr-09-loc-50-14-18:/bricks/brick2
Brick3: sr-09-loc-50-14-18:/bricks/brick3
Brick4: sr-09-loc-50-14-18:/bricks/brick4
Brick5: sr-09-loc-50-14-18:/bricks/brick5
Brick6: sr-09-loc-50-14-18:/bricks/brick6
Brick7: sr-09-loc-50-14-18:/bricks/brick7
Brick8: sr-09-loc-50-14-18:/bricks/brick8
Brick9: sr-09-loc-50-14-18:/bricks/brick9
Brick10: sr-09-loc-50-14-18:/bricks/brick10
Brick11: sr-10-loc-50-14-18:/bricks/brick1
Brick12: sr-10-loc-50-14-18:/bricks/brick2
Brick13: sr-10-loc-50-14-18:/bricks/brick3
Brick14: sr-10-loc-50-14-18:/bricks/brick4
Brick15: sr-10-loc-50-14-18:/bricks/brick5
Brick16: sr-10-loc-50-14-18:/bricks/brick6
Brick17: sr-10-loc-50-14-18:/bricks/brick7
Brick18: sr-10-loc-50-14-18:/bricks/brick8
Brick19: sr-10-loc-50-14-18:/bricks/brick9
Brick20: sr-10-loc-50-14-18:/bricks/brick10
features.shard-block-size: 32MB
features.shard: on
transport.address-family: inet
nfs.disable: on
-Gencer.
*Sent:* Friday, June 30, 2017 2:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Could you please provide the volume-info output?
-Krutika
Hi,
I have an 2 nodes with 20 bricks in total (10+10).
2 Nodes with Distributed – Striped – Replicated (2 x 2)
10GbE Speed between nodes
“dd” performance: 400mb/s and higher
Downloading a large file from internet and directly to the gluster: 250-300mb/s
Now same test without Stripe but with sharding. This results are same when
I set shard size 4MB or 32MB. (Again 2x Replica here)
Dd performance: 70mb/s
Download directly to the gluster performance : 60mb/s
Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.
I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.
I tried tuning (cache, window-size etc..). Nothing helps.
GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
4TB each.
Is there any tweak/tuning out there to make it fast?
Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.
The reason behind I use shard instead of stripe is i would like to
eleminate files that bigger than brick size.
Thanks,
Gencer.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Krutika Dhananjay
2017-07-06 00:30:24 UTC
Reply
Permalink
Raw Message
What if you disabled eager lock and run your test again on the sharded
configuration along with the profile output?

# gluster volume set <VOL> cluster.eager-lock off

-Krutika
Post by Krutika Dhananjay
Thanks. I think reusing the same volume was the cause of lack of IO
distribution.
The latest profile output looks much more realistic and in line with i
would expect.
Let me analyse the numbers a bit and get back.
-Krutika
Post by g***@gencgiyen.com
Hi Krutika,
1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.
And others;
No. I use only one volume. When I tested sharded and striped volumes, I
manually stopped volume, deleted volume, purged data (data inside of
sudo gluster volume create testvol replica 2
sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1
sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2
sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3
sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4
sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5
sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6
sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7
sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8
sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9
sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10
force
and of course after that volume start executed. If shard enabled, I
enable that feature BEFORE I start the sharded volume than mount.
I tried converting from one to another but then I saw documentation says
clean voluje should be better. So I tried clean method. Still same
performance.
dd if=/dev/zero of=/mnt/testfile bs=1G count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)
And this example have generated a profile which I also attached to this e-mail.
Is there anything that I can try? I am open to all kind of suggestions.
Thanks,
Gencer.
*Sent:* Tuesday, July 4, 2017 9:39 AM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Hi Gencer,
I just checked the volume-profile attachments.
1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
seems to have witnessed all the IO. No other bricks witnessed any write
operations. This is unacceptable for a volume that has 8 other replica
sets. Why didn't the shards get distributed across all of these sets?
2. For replica set consisting of bricks 5 and 6 of node 09, I see that
the brick 5 is spending 99% of its time in FINODELK fop, when the fop that
should have dominated its profile should have been in fact WRITE.
Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather
these numbers - one distributed-replicated-striped and another
distributed-replicated-sharded? Or are you merely converting a single
volume from one type to another?
* And if there are indeed two volumes, could you share both their `volume
info` outputs to eliminate any confusion?
* If there's just one volume, are you taking care to remove all data from
the mount point of this volume before converting it?
* What is the size the test file grew to?
* These attached profiles are against dd runs? Or the file download test?
-Krutika
Hi Krutika,
Have you be able to look out my profiles? Do you have any clue, idea or suggestion?
Thanks,
-Gencer
*Sent:* Friday, June 30, 2017 3:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Just noticed that the way you have configured your brick order during
volume-create makes both replicas of every set reside on the same machine.
That apart, do you see any difference if you change shard-block-size to
512MB? Could you try that?
If it doesn't help, could you share the volume-profile output for both
the tests (separate)?
1. Start profile before starting your test - it could be dd or it could be file download.
# gluster volume profile <VOL> start
2. Run your test - again either dd or file-download.
3. Once the test has completed, run `gluster volume profile <VOL> info`
and redirect its output to a tmp file.
4. Stop profile
# gluster volume profile <VOL> stop
And attach the volume-profile output file that you saved at a temporary
location in step 3.
-Krutika
Hi Krutika,
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
Status: Started
Snapshot Count: 0
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Brick1: sr-09-loc-50-14-18:/bricks/brick1
Brick2: sr-09-loc-50-14-18:/bricks/brick2
Brick3: sr-09-loc-50-14-18:/bricks/brick3
Brick4: sr-09-loc-50-14-18:/bricks/brick4
Brick5: sr-09-loc-50-14-18:/bricks/brick5
Brick6: sr-09-loc-50-14-18:/bricks/brick6
Brick7: sr-09-loc-50-14-18:/bricks/brick7
Brick8: sr-09-loc-50-14-18:/bricks/brick8
Brick9: sr-09-loc-50-14-18:/bricks/brick9
Brick10: sr-09-loc-50-14-18:/bricks/brick10
Brick11: sr-10-loc-50-14-18:/bricks/brick1
Brick12: sr-10-loc-50-14-18:/bricks/brick2
Brick13: sr-10-loc-50-14-18:/bricks/brick3
Brick14: sr-10-loc-50-14-18:/bricks/brick4
Brick15: sr-10-loc-50-14-18:/bricks/brick5
Brick16: sr-10-loc-50-14-18:/bricks/brick6
Brick17: sr-10-loc-50-14-18:/bricks/brick7
Brick18: sr-10-loc-50-14-18:/bricks/brick8
Brick19: sr-10-loc-50-14-18:/bricks/brick9
Brick20: sr-10-loc-50-14-18:/bricks/brick10
features.shard-block-size: 32MB
features.shard: on
transport.address-family: inet
nfs.disable: on
-Gencer.
*Sent:* Friday, June 30, 2017 2:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Could you please provide the volume-info output?
-Krutika
Hi,
I have an 2 nodes with 20 bricks in total (10+10).
2 Nodes with Distributed – Striped – Replicated (2 x 2)
10GbE Speed between nodes
“dd” performance: 400mb/s and higher
Downloading a large file from internet and directly to the gluster: 250-300mb/s
Now same test without Stripe but with sharding. This results are same
when I set shard size 4MB or 32MB. (Again 2x Replica here)
Dd performance: 70mb/s
Download directly to the gluster performance : 60mb/s
Now, If we do this test twice at the same time (two dd or two doewnload
at the same time) it goes below 25/mb each or slower.
I thought sharding is at least equal or a little slower (maybe?) but
these results are terribly slow.
I tried tuning (cache, window-size etc..). Nothing helps.
GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
4TB each.
Is there any tweak/tuning out there to make it fast?
Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.
The reason behind I use shard instead of stripe is i would like to
eleminate files that bigger than brick size.
Thanks,
Gencer.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
g***@gencgiyen.com
2017-07-06 07:27:12 UTC
Reply
Permalink
Raw Message
Ki Krutika,



After that setting:



$ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1

0+1 records in

0+1 records out

2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile3 bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s



$ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s



I see improvements (from 70-75mb to 90-100mb per second) after eager-lock off setting. Also, I monitoring the bandwidth between two nodes. I see up to 102MB/s.



Is there anything I can do to optimize more? Or is it last stop?



Note: I deleted all files again and reformat then re-create volume with shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results are equal.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Thursday, July 6, 2017 3:30 AM
To: ***@gencgiyen.com
Cc: gluster-user <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



What if you disabled eager lock and run your test again on the sharded configuration along with the profile output?

# gluster volume set <VOL> cluster.eager-lock off

-Krutika



On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <***@redhat.com <mailto:***@redhat.com> > wrote:

Thanks. I think reusing the same volume was the cause of lack of IO distribution.

The latest profile output looks much more realistic and in line with i would expect.

Let me analyse the numbers a bit and get back.



-Krutika



On Tue, Jul 4, 2017 at 12:55 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Thank you so much for myour reply. Let me answer all:



1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.



And others;



No. I use only one volume. When I tested sharded and striped volumes, I manually stopped volume, deleted volume, purged data (data inside of bricks/disks) and re-create by using this command:



sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10 force



and of course after that volume start executed. If shard enabled, I enable that feature BEFORE I start the sharded volume than mount.



I tried converting from one to another but then I saw documentation says clean voluje should be better. So I tried clean method. Still same performance.



Testfile grows from 1GB to 5GB. And tests are dd. See this example:



dd if=/dev/zero of=/mnt/testfile bs=1G count=5

5+0 records in

5+0 records out

5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)





And this example have generated a profile which I also attached to this e-mail.



Is there anything that I can try? I am open to all kind of suggestions.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Tuesday, July 4, 2017 9:39 AM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?



2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these numbers - one distributed-replicated-striped and another distributed-replicated-sharded? Or are you merely converting a single volume from one type to another?



* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?



-Krutika





On Mon, Jul 3, 2017 at 8:42 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Have you be able to look out my profiles? Do you have any clue, idea or suggestion?



Thanks,

-Gencer



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 3:50 PM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
g***@gencgiyen.com
2017-07-06 07:33:35 UTC
Reply
Permalink
Raw Message
Krutika, I’m sorry I forgot to add logs. I attached them now.



Thanks,

Gencer.







From: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com
Sent: Thursday, July 6, 2017 10:27 AM
To: 'Krutika Dhananjay' <***@redhat.com>
Cc: 'gluster-user' <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Ki Krutika,



After that setting:



$ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1

0+1 records in

0+1 records out

2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile3 bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s



$ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s



I see improvements (from 70-75mb to 90-100mb per second) after eager-lock off setting. Also, I monitoring the bandwidth between two nodes. I see up to 102MB/s.



Is there anything I can do to optimize more? Or is it last stop?



Note: I deleted all files again and reformat then re-create volume with shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results are equal.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Thursday, July 6, 2017 3:30 AM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



What if you disabled eager lock and run your test again on the sharded configuration along with the profile output?

# gluster volume set <VOL> cluster.eager-lock off

-Krutika



On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <***@redhat.com <mailto:***@redhat.com> > wrote:

Thanks. I think reusing the same volume was the cause of lack of IO distribution.

The latest profile output looks much more realistic and in line with i would expect.

Let me analyse the numbers a bit and get back.



-Krutika



On Tue, Jul 4, 2017 at 12:55 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Thank you so much for myour reply. Let me answer all:



1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.



And others;



No. I use only one volume. When I tested sharded and striped volumes, I manually stopped volume, deleted volume, purged data (data inside of bricks/disks) and re-create by using this command:



sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10 force



and of course after that volume start executed. If shard enabled, I enable that feature BEFORE I start the sharded volume than mount.



I tried converting from one to another but then I saw documentation says clean voluje should be better. So I tried clean method. Still same performance.



Testfile grows from 1GB to 5GB. And tests are dd. See this example:



dd if=/dev/zero of=/mnt/testfile bs=1G count=5

5+0 records in

5+0 records out

5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)





And this example have generated a profile which I also attached to this e-mail.



Is there anything that I can try? I am open to all kind of suggestions.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Tuesday, July 4, 2017 9:39 AM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?



2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these numbers - one distributed-replicated-striped and another distributed-replicated-sharded? Or are you merely converting a single volume from one type to another?



* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?



-Krutika





On Mon, Jul 3, 2017 at 8:42 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Have you be able to look out my profiles? Do you have any clue, idea or suggestion?



Thanks,

-Gencer



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 3:50 PM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
g***@gencgiyen.com
2017-07-06 08:06:01 UTC
Reply
Permalink
Raw Message
Hi Krutika,



I also did one more test. I re-created another volume (single volume. Old one destroyed-deleted) then do 2 dd tests. One for 1GB other for 2GB. Both are 32MB shard and eager-lock off.



Samples:



sr:~# gluster volume profile testvol start

Starting volume profile on testvol has been successful

sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2708 s, 87.5 MB/s

sr:~# gluster volume profile testvol info > /32mb_shard_and_1gb_dd.log

sr:~# gluster volume profile testvol stop

Stopping volume profile on testvol has been successful

sr:~# gluster volume profile testvol start

Starting volume profile on testvol has been successful

sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 23.5457 s, 91.2 MB/s

sr:~# gluster volume profile testvol info > /32mb_shard_and_2gb_dd.log

sr:~# gluster volume profile testvol stop

Stopping volume profile on testvol has been successful



Also here is volume info:



sr:~# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 3cc06d95-06e9-41f8-8b26-e997886d7ba1

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-10-loc-50-14-18:/bricks/brick1

Brick3: sr-09-loc-50-14-18:/bricks/brick2

Brick4: sr-10-loc-50-14-18:/bricks/brick2

Brick5: sr-09-loc-50-14-18:/bricks/brick3

Brick6: sr-10-loc-50-14-18:/bricks/brick3

Brick7: sr-09-loc-50-14-18:/bricks/brick4

Brick8: sr-10-loc-50-14-18:/bricks/brick4

Brick9: sr-09-loc-50-14-18:/bricks/brick5

Brick10: sr-10-loc-50-14-18:/bricks/brick5

Brick11: sr-09-loc-50-14-18:/bricks/brick6

Brick12: sr-10-loc-50-14-18:/bricks/brick6

Brick13: sr-09-loc-50-14-18:/bricks/brick7

Brick14: sr-10-loc-50-14-18:/bricks/brick7

Brick15: sr-09-loc-50-14-18:/bricks/brick8

Brick16: sr-10-loc-50-14-18:/bricks/brick8

Brick17: sr-09-loc-50-14-18:/bricks/brick9

Brick18: sr-10-loc-50-14-18:/bricks/brick9

Brick19: sr-09-loc-50-14-18:/bricks/brick10

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

cluster.eager-lock: off

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



See attached results and sorry for the multiple e-mails. I just want to make sure that I provided correct results for the tests.



Thanks,

Gencer.



From: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com
Sent: Thursday, July 6, 2017 10:34 AM
To: 'Krutika Dhananjay' <***@redhat.com>
Cc: 'gluster-user' <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Krutika, I’m sorry I forgot to add logs. I attached them now.



Thanks,

Gencer.







From: gluster-users-***@gluster.org <mailto:gluster-users-***@gluster.org> [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com <mailto:***@gencgiyen.com>
Sent: Thursday, July 6, 2017 10:27 AM
To: 'Krutika Dhananjay' <***@redhat.com <mailto:***@redhat.com> >
Cc: 'gluster-user' <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Ki Krutika,



After that setting:



$ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1

0+1 records in

0+1 records out

2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile3 bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s



$ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s



I see improvements (from 70-75mb to 90-100mb per second) after eager-lock off setting. Also, I monitoring the bandwidth between two nodes. I see up to 102MB/s.



Is there anything I can do to optimize more? Or is it last stop?



Note: I deleted all files again and reformat then re-create volume with shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results are equal.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Thursday, July 6, 2017 3:30 AM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



What if you disabled eager lock and run your test again on the sharded configuration along with the profile output?

# gluster volume set <VOL> cluster.eager-lock off

-Krutika



On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <***@redhat.com <mailto:***@redhat.com> > wrote:

Thanks. I think reusing the same volume was the cause of lack of IO distribution.

The latest profile output looks much more realistic and in line with i would expect.

Let me analyse the numbers a bit and get back.



-Krutika



On Tue, Jul 4, 2017 at 12:55 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Thank you so much for myour reply. Let me answer all:



1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.



And others;



No. I use only one volume. When I tested sharded and striped volumes, I manually stopped volume, deleted volume, purged data (data inside of bricks/disks) and re-create by using this command:



sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10 force



and of course after that volume start executed. If shard enabled, I enable that feature BEFORE I start the sharded volume than mount.



I tried converting from one to another but then I saw documentation says clean voluje should be better. So I tried clean method. Still same performance.



Testfile grows from 1GB to 5GB. And tests are dd. See this example:



dd if=/dev/zero of=/mnt/testfile bs=1G count=5

5+0 records in

5+0 records out

5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)





And this example have generated a profile which I also attached to this e-mail.



Is there anything that I can try? I am open to all kind of suggestions.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Tuesday, July 4, 2017 9:39 AM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?



2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these numbers - one distributed-replicated-striped and another distributed-replicated-sharded? Or are you merely converting a single volume from one type to another?



* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?



-Krutika





On Mon, Jul 3, 2017 at 8:42 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Have you be able to look out my profiles? Do you have any clue, idea or suggestion?



Thanks,

-Gencer



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 3:50 PM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
g***@gencgiyen.com
2017-07-10 14:29:09 UTC
Reply
Permalink
Raw Message
Hi Krutika,



May I kindly ping to you and ask that If you have any idea yet or figured out whats the issue may?



I am awaiting your reply with four eyes :)



Apologies for the ping :)



-Gencer.



From: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com
Sent: Thursday, July 6, 2017 11:06 AM
To: 'Krutika Dhananjay' <***@redhat.com>
Cc: 'gluster-user' <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Krutika,



I also did one more test. I re-created another volume (single volume. Old one destroyed-deleted) then do 2 dd tests. One for 1GB other for 2GB. Both are 32MB shard and eager-lock off.



Samples:



sr:~# gluster volume profile testvol start

Starting volume profile on testvol has been successful

sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2708 s, 87.5 MB/s

sr:~# gluster volume profile testvol info > /32mb_shard_and_1gb_dd.log

sr:~# gluster volume profile testvol stop

Stopping volume profile on testvol has been successful

sr:~# gluster volume profile testvol start

Starting volume profile on testvol has been successful

sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 23.5457 s, 91.2 MB/s

sr:~# gluster volume profile testvol info > /32mb_shard_and_2gb_dd.log

sr:~# gluster volume profile testvol stop

Stopping volume profile on testvol has been successful



Also here is volume info:



sr:~# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 3cc06d95-06e9-41f8-8b26-e997886d7ba1

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-10-loc-50-14-18:/bricks/brick1

Brick3: sr-09-loc-50-14-18:/bricks/brick2

Brick4: sr-10-loc-50-14-18:/bricks/brick2

Brick5: sr-09-loc-50-14-18:/bricks/brick3

Brick6: sr-10-loc-50-14-18:/bricks/brick3

Brick7: sr-09-loc-50-14-18:/bricks/brick4

Brick8: sr-10-loc-50-14-18:/bricks/brick4

Brick9: sr-09-loc-50-14-18:/bricks/brick5

Brick10: sr-10-loc-50-14-18:/bricks/brick5

Brick11: sr-09-loc-50-14-18:/bricks/brick6

Brick12: sr-10-loc-50-14-18:/bricks/brick6

Brick13: sr-09-loc-50-14-18:/bricks/brick7

Brick14: sr-10-loc-50-14-18:/bricks/brick7

Brick15: sr-09-loc-50-14-18:/bricks/brick8

Brick16: sr-10-loc-50-14-18:/bricks/brick8

Brick17: sr-09-loc-50-14-18:/bricks/brick9

Brick18: sr-10-loc-50-14-18:/bricks/brick9

Brick19: sr-09-loc-50-14-18:/bricks/brick10

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

cluster.eager-lock: off

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



See attached results and sorry for the multiple e-mails. I just want to make sure that I provided correct results for the tests.



Thanks,

Gencer.



From: gluster-users-***@gluster.org <mailto:gluster-users-***@gluster.org> [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com <mailto:***@gencgiyen.com>
Sent: Thursday, July 6, 2017 10:34 AM
To: 'Krutika Dhananjay' <***@redhat.com <mailto:***@redhat.com> >
Cc: 'gluster-user' <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Krutika, I’m sorry I forgot to add logs. I attached them now.



Thanks,

Gencer.







From: gluster-users-***@gluster.org <mailto:gluster-users-***@gluster.org> [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com <mailto:***@gencgiyen.com>
Sent: Thursday, July 6, 2017 10:27 AM
To: 'Krutika Dhananjay' <***@redhat.com <mailto:***@redhat.com> >
Cc: 'gluster-user' <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Ki Krutika,



After that setting:



$ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1

0+1 records in

0+1 records out

2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile3 bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s



$ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s



I see improvements (from 70-75mb to 90-100mb per second) after eager-lock off setting. Also, I monitoring the bandwidth between two nodes. I see up to 102MB/s.



Is there anything I can do to optimize more? Or is it last stop?



Note: I deleted all files again and reformat then re-create volume with shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results are equal.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Thursday, July 6, 2017 3:30 AM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



What if you disabled eager lock and run your test again on the sharded configuration along with the profile output?

# gluster volume set <VOL> cluster.eager-lock off

-Krutika



On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <***@redhat.com <mailto:***@redhat.com> > wrote:

Thanks. I think reusing the same volume was the cause of lack of IO distribution.

The latest profile output looks much more realistic and in line with i would expect.

Let me analyse the numbers a bit and get back.



-Krutika



On Tue, Jul 4, 2017 at 12:55 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Thank you so much for myour reply. Let me answer all:



1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.



And others;



No. I use only one volume. When I tested sharded and striped volumes, I manually stopped volume, deleted volume, purged data (data inside of bricks/disks) and re-create by using this command:



sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10 force



and of course after that volume start executed. If shard enabled, I enable that feature BEFORE I start the sharded volume than mount.



I tried converting from one to another but then I saw documentation says clean voluje should be better. So I tried clean method. Still same performance.



Testfile grows from 1GB to 5GB. And tests are dd. See this example:



dd if=/dev/zero of=/mnt/testfile bs=1G count=5

5+0 records in

5+0 records out

5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)





And this example have generated a profile which I also attached to this e-mail.



Is there anything that I can try? I am open to all kind of suggestions.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Tuesday, July 4, 2017 9:39 AM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?



2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these numbers - one distributed-replicated-striped and another distributed-replicated-sharded? Or are you merely converting a single volume from one type to another?



* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?



-Krutika





On Mon, Jul 3, 2017 at 8:42 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Have you be able to look out my profiles? Do you have any clue, idea or suggestion?



Thanks,

-Gencer



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 3:50 PM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.
Krutika Dhananjay
2017-07-12 07:19:39 UTC
Reply
Permalink
Raw Message
Hi,

Sorry for the late response.
No, so eager-lock experiment was more to see if the implementation had any
new bugs.
It doesn't look like it does. I think having it on would be the right thing
to do. It will reduce the number of fops having to go over the network.

Coming to the performance drop, I compared the volume profile output for
stripe and 32MB shard again.
The only thing that is striking is the number of xattrops and inodelks,
which is only 2-4 for striped volume
whereas the number is much bigger in the case of sharded volume. This is
unfortunately likely with sharding because
the optimizations eager-locking and delayed post-op will now only be
applicable on a per-shard basis.
Larger the shard size, the better, to work around this issue.

Meanwhile, let me think about how we can get this fixed in code.

-Krutika
Post by g***@gencgiyen.com
Hi Krutika,
May I kindly ping to you and ask that If you have any idea yet or figured
out whats the issue may?
I am awaiting your reply with four eyes :)
Apologies for the ping :)
-Gencer.
*Sent:* Thursday, July 6, 2017 11:06 AM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Hi Krutika,
I also did one more test. I re-created another volume (single volume. Old
one destroyed-deleted) then do 2 dd tests. One for 1GB other for 2GB. Both
are 32MB shard and eager-lock off.
sr:~# gluster volume profile testvol start
Starting volume profile on testvol has been successful
sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2708 s, 87.5 MB/s
sr:~# gluster volume profile testvol info > /32mb_shard_and_1gb_dd.log
sr:~# gluster volume profile testvol stop
Stopping volume profile on testvol has been successful
sr:~# gluster volume profile testvol start
Starting volume profile on testvol has been successful
sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=2
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 23.5457 s, 91.2 MB/s
sr:~# gluster volume profile testvol info > /32mb_shard_and_2gb_dd.log
sr:~# gluster volume profile testvol stop
Stopping volume profile on testvol has been successful
sr:~# gluster volume info testvol
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 3cc06d95-06e9-41f8-8b26-e997886d7ba1
Status: Started
Snapshot Count: 0
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Brick1: sr-09-loc-50-14-18:/bricks/brick1
Brick2: sr-10-loc-50-14-18:/bricks/brick1
Brick3: sr-09-loc-50-14-18:/bricks/brick2
Brick4: sr-10-loc-50-14-18:/bricks/brick2
Brick5: sr-09-loc-50-14-18:/bricks/brick3
Brick6: sr-10-loc-50-14-18:/bricks/brick3
Brick7: sr-09-loc-50-14-18:/bricks/brick4
Brick8: sr-10-loc-50-14-18:/bricks/brick4
Brick9: sr-09-loc-50-14-18:/bricks/brick5
Brick10: sr-10-loc-50-14-18:/bricks/brick5
Brick11: sr-09-loc-50-14-18:/bricks/brick6
Brick12: sr-10-loc-50-14-18:/bricks/brick6
Brick13: sr-09-loc-50-14-18:/bricks/brick7
Brick14: sr-10-loc-50-14-18:/bricks/brick7
Brick15: sr-09-loc-50-14-18:/bricks/brick8
Brick16: sr-10-loc-50-14-18:/bricks/brick8
Brick17: sr-09-loc-50-14-18:/bricks/brick9
Brick18: sr-10-loc-50-14-18:/bricks/brick9
Brick19: sr-09-loc-50-14-18:/bricks/brick10
Brick20: sr-10-loc-50-14-18:/bricks/brick10
cluster.eager-lock: off
features.shard-block-size: 32MB
features.shard: on
transport.address-family: inet
nfs.disable: on
See attached results and sorry for the multiple e-mails. I just want to
make sure that I provided correct results for the tests.
Thanks,
Gencer.
*Sent:* Thursday, July 6, 2017 10:34 AM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Krutika, I’m sorry I forgot to add logs. I attached them now.
Thanks,
Gencer.
*Sent:* Thursday, July 6, 2017 10:27 AM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Ki Krutika,
$ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s
$ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s
$ dd if=/dev/zero of=/mnt/ddfile3 bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s
$ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s
I see improvements (from 70-75mb to 90-100mb per second) after eager-lock
off setting. Also, I monitoring the bandwidth between two nodes. I see up
to 102MB/s.
Is there anything I can do to optimize more? Or is it last stop?
Note: I deleted all files again and reformat then re-create volume with
shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results
are equal.
Thanks,
Gencer.
*Sent:* Thursday, July 6, 2017 3:30 AM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
What if you disabled eager lock and run your test again on the sharded
configuration along with the profile output?
# gluster volume set <VOL> cluster.eager-lock off
-Krutika
Thanks. I think reusing the same volume was the cause of lack of IO distribution.
The latest profile output looks much more realistic and in line with i would expect.
Let me analyse the numbers a bit and get back.
-Krutika
Hi Krutika,
1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.
And others;
No. I use only one volume. When I tested sharded and striped volumes, I
manually stopped volume, deleted volume, purged data (data inside of
sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1
sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2
sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3
sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4
sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5
sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6
sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7
sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8
sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9
sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10
sr-10-loc-50-14-18:/bricks/brick10 force
and of course after that volume start executed. If shard enabled, I enable
that feature BEFORE I start the sharded volume than mount.
I tried converting from one to another but then I saw documentation says
clean voluje should be better. So I tried clean method. Still same
performance.
dd if=/dev/zero of=/mnt/testfile bs=1G count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)
And this example have generated a profile which I also attached to this e-mail.
Is there anything that I can try? I am open to all kind of suggestions.
Thanks,
Gencer.
*Sent:* Tuesday, July 4, 2017 9:39 AM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Hi Gencer,
I just checked the volume-profile attachments.
1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
seems to have witnessed all the IO. No other bricks witnessed any write
operations. This is unacceptable for a volume that has 8 other replica
sets. Why didn't the shards get distributed across all of these sets?
2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
brick 5 is spending 99% of its time in FINODELK fop, when the fop that
should have dominated its profile should have been in fact WRITE.
Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather
these numbers - one distributed-replicated-striped and another
distributed-replicated-sharded? Or are you merely converting a single
volume from one type to another?
* And if there are indeed two volumes, could you share both their `volume
info` outputs to eliminate any confusion?
* If there's just one volume, are you taking care to remove all data from
the mount point of this volume before converting it?
* What is the size the test file grew to?
* These attached profiles are against dd runs? Or the file download test?
-Krutika
Hi Krutika,
Have you be able to look out my profiles? Do you have any clue, idea or suggestion?
Thanks,
-Gencer
*Sent:* Friday, June 30, 2017 3:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Just noticed that the way you have configured your brick order during
volume-create makes both replicas of every set reside on the same machine.
That apart, do you see any difference if you change shard-block-size to
512MB? Could you try that?
If it doesn't help, could you share the volume-profile output for both the
tests (separate)?
1. Start profile before starting your test - it could be dd or it could be file download.
# gluster volume profile <VOL> start
2. Run your test - again either dd or file-download.
3. Once the test has completed, run `gluster volume profile <VOL> info`
and redirect its output to a tmp file.
4. Stop profile
# gluster volume profile <VOL> stop
And attach the volume-profile output file that you saved at a temporary location in step 3.
-Krutika
Hi Krutika,
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
Status: Started
Snapshot Count: 0
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Brick1: sr-09-loc-50-14-18:/bricks/brick1
Brick2: sr-09-loc-50-14-18:/bricks/brick2
Brick3: sr-09-loc-50-14-18:/bricks/brick3
Brick4: sr-09-loc-50-14-18:/bricks/brick4
Brick5: sr-09-loc-50-14-18:/bricks/brick5
Brick6: sr-09-loc-50-14-18:/bricks/brick6
Brick7: sr-09-loc-50-14-18:/bricks/brick7
Brick8: sr-09-loc-50-14-18:/bricks/brick8
Brick9: sr-09-loc-50-14-18:/bricks/brick9
Brick10: sr-09-loc-50-14-18:/bricks/brick10
Brick11: sr-10-loc-50-14-18:/bricks/brick1
Brick12: sr-10-loc-50-14-18:/bricks/brick2
Brick13: sr-10-loc-50-14-18:/bricks/brick3
Brick14: sr-10-loc-50-14-18:/bricks/brick4
Brick15: sr-10-loc-50-14-18:/bricks/brick5
Brick16: sr-10-loc-50-14-18:/bricks/brick6
Brick17: sr-10-loc-50-14-18:/bricks/brick7
Brick18: sr-10-loc-50-14-18:/bricks/brick8
Brick19: sr-10-loc-50-14-18:/bricks/brick9
Brick20: sr-10-loc-50-14-18:/bricks/brick10
features.shard-block-size: 32MB
features.shard: on
transport.address-family: inet
nfs.disable: on
-Gencer.
*Sent:* Friday, June 30, 2017 2:50 PM
*Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
Could you please provide the volume-info output?
-Krutika
Hi,
I have an 2 nodes with 20 bricks in total (10+10).
2 Nodes with Distributed – Striped – Replicated (2 x 2)
10GbE Speed between nodes
“dd” performance: 400mb/s and higher
Downloading a large file from internet and directly to the gluster: 250-300mb/s
Now same test without Stripe but with sharding. This results are same when
I set shard size 4MB or 32MB. (Again 2x Replica here)
Dd performance: 70mb/s
Download directly to the gluster performance : 60mb/s
Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.
I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.
I tried tuning (cache, window-size etc..). Nothing helps.
GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
4TB each.
Is there any tweak/tuning out there to make it fast?
Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow.
The reason behind I use shard instead of stripe is i would like to
eleminate files that bigger than brick size.
Thanks,
Gencer.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
g***@gencgiyen.com
2017-07-12 10:42:46 UTC
Reply
Permalink
Raw Message
Hi Krutika,



Thank you so much for your help and replies. I really appreciated it.



I will await your reply on this. Just please don’t forget me 😊 😝



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Wednesday, July 12, 2017 10:20 AM
To: ***@gencgiyen.com
Cc: gluster-user <gluster-***@gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi,

Sorry for the late response.

No, so eager-lock experiment was more to see if the implementation had any new bugs.

It doesn't look like it does. I think having it on would be the right thing to do. It will reduce the number of fops having to go over the network.

Coming to the performance drop, I compared the volume profile output for stripe and 32MB shard again.

The only thing that is striking is the number of xattrops and inodelks, which is only 2-4 for striped volume
whereas the number is much bigger in the case of sharded volume. This is unfortunately likely with sharding because

the optimizations eager-locking and delayed post-op will now only be applicable on a per-shard basis.

Larger the shard size, the better, to work around this issue.

Meanwhile, let me think about how we can get this fixed in code.

-Krutika





On Mon, Jul 10, 2017 at 7:59 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



May I kindly ping to you and ask that If you have any idea yet or figured out whats the issue may?



I am awaiting your reply with four eyes :)



Apologies for the ping :)



-Gencer.



From: gluster-users-***@gluster.org <mailto:gluster-users-***@gluster.org> [mailto:gluster-users-***@gluster.org <mailto:gluster-users-***@gluster.org> ] On Behalf Of ***@gencgiyen.com <mailto:***@gencgiyen.com>
Sent: Thursday, July 6, 2017 11:06 AM


To: 'Krutika Dhananjay' <***@redhat.com <mailto:***@redhat.com> >
Cc: 'gluster-user' <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Krutika,



I also did one more test. I re-created another volume (single volume. Old one destroyed-deleted) then do 2 dd tests. One for 1GB other for 2GB. Both are 32MB shard and eager-lock off.



Samples:



sr:~# gluster volume profile testvol start

Starting volume profile on testvol has been successful

sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2708 s, 87.5 MB/s

sr:~# gluster volume profile testvol info > /32mb_shard_and_1gb_dd.log

sr:~# gluster volume profile testvol stop

Stopping volume profile on testvol has been successful

sr:~# gluster volume profile testvol start

Starting volume profile on testvol has been successful

sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 23.5457 s, 91.2 MB/s

sr:~# gluster volume profile testvol info > /32mb_shard_and_2gb_dd.log

sr:~# gluster volume profile testvol stop

Stopping volume profile on testvol has been successful



Also here is volume info:



sr:~# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 3cc06d95-06e9-41f8-8b26-e997886d7ba1

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-10-loc-50-14-18:/bricks/brick1

Brick3: sr-09-loc-50-14-18:/bricks/brick2

Brick4: sr-10-loc-50-14-18:/bricks/brick2

Brick5: sr-09-loc-50-14-18:/bricks/brick3

Brick6: sr-10-loc-50-14-18:/bricks/brick3

Brick7: sr-09-loc-50-14-18:/bricks/brick4

Brick8: sr-10-loc-50-14-18:/bricks/brick4

Brick9: sr-09-loc-50-14-18:/bricks/brick5

Brick10: sr-10-loc-50-14-18:/bricks/brick5

Brick11: sr-09-loc-50-14-18:/bricks/brick6

Brick12: sr-10-loc-50-14-18:/bricks/brick6

Brick13: sr-09-loc-50-14-18:/bricks/brick7

Brick14: sr-10-loc-50-14-18:/bricks/brick7

Brick15: sr-09-loc-50-14-18:/bricks/brick8

Brick16: sr-10-loc-50-14-18:/bricks/brick8

Brick17: sr-09-loc-50-14-18:/bricks/brick9

Brick18: sr-10-loc-50-14-18:/bricks/brick9

Brick19: sr-09-loc-50-14-18:/bricks/brick10

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

cluster.eager-lock: off

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



See attached results and sorry for the multiple e-mails. I just want to make sure that I provided correct results for the tests.



Thanks,

Gencer.



From: gluster-users-***@gluster.org <mailto:gluster-users-***@gluster.org> [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com <mailto:***@gencgiyen.com>
Sent: Thursday, July 6, 2017 10:34 AM
To: 'Krutika Dhananjay' <***@redhat.com <mailto:***@redhat.com> >
Cc: 'gluster-user' <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Krutika, I’m sorry I forgot to add logs. I attached them now.



Thanks,

Gencer.







From: gluster-users-***@gluster.org <mailto:gluster-users-***@gluster.org> [mailto:gluster-users-***@gluster.org] On Behalf Of ***@gencgiyen.com <mailto:***@gencgiyen.com>
Sent: Thursday, July 6, 2017 10:27 AM
To: 'Krutika Dhananjay' <***@redhat.com <mailto:***@redhat.com> >
Cc: 'gluster-user' <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Ki Krutika,



After that setting:



$ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1

0+1 records in

0+1 records out

2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s



$ dd if=/dev/zero of=/mnt/ddfile3 bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s



$ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2

2+0 records in

2+0 records out

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s



I see improvements (from 70-75mb to 90-100mb per second) after eager-lock off setting. Also, I monitoring the bandwidth between two nodes. I see up to 102MB/s.



Is there anything I can do to optimize more? Or is it last stop?



Note: I deleted all files again and reformat then re-create volume with shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results are equal.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com]
Sent: Thursday, July 6, 2017 3:30 AM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



What if you disabled eager lock and run your test again on the sharded configuration along with the profile output?

# gluster volume set <VOL> cluster.eager-lock off

-Krutika



On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <***@redhat.com <mailto:***@redhat.com> > wrote:

Thanks. I think reusing the same volume was the cause of lack of IO distribution.

The latest profile output looks much more realistic and in line with i would expect.

Let me analyse the numbers a bit and get back.



-Krutika



On Tue, Jul 4, 2017 at 12:55 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Thank you so much for myour reply. Let me answer all:



1. I have no idea why it did not get distributed over all bricks.
2. Hm.. This is really weird.



And others;



No. I use only one volume. When I tested sharded and striped volumes, I manually stopped volume, deleted volume, purged data (data inside of bricks/disks) and re-create by using this command:



sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10 force



and of course after that volume start executed. If shard enabled, I enable that feature BEFORE I start the sharded volume than mount.



I tried converting from one to another but then I saw documentation says clean voluje should be better. So I tried clean method. Still same performance.



Testfile grows from 1GB to 5GB. And tests are dd. See this example:



dd if=/dev/zero of=/mnt/testfile bs=1G count=5

5+0 records in

5+0 records out

5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)





And this example have generated a profile which I also attached to this e-mail.



Is there anything that I can try? I am open to all kind of suggestions.



Thanks,

Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Tuesday, July 4, 2017 9:39 AM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?



2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these numbers - one distributed-replicated-striped and another distributed-replicated-sharded? Or are you merely converting a single volume from one type to another?



* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?



-Krutika





On Mon, Jul 3, 2017 at 8:42 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Have you be able to look out my profiles? Do you have any clue, idea or suggestion?



Thanks,

-Gencer



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 3:50 PM


To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file download.

# gluster volume profile <VOL> start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.

4. Stop profile

# gluster volume profile <VOL> stop

And attach the volume-profile output file that you saved at a temporary location in step 3.

-Krutika



On Fri, Jun 30, 2017 at 5:33 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi Krutika,



Sure, here is volume info:



***@sr-09-loc-50-14-18:/# gluster volume info testvol



Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on



-Gencer.



From: Krutika Dhananjay [mailto:***@redhat.com <mailto:***@redhat.com> ]
Sent: Friday, June 30, 2017 2:50 PM
To: ***@gencgiyen.com <mailto:***@gencgiyen.com>
Cc: gluster-user <gluster-***@gluster.org <mailto:gluster-***@gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS



Could you please provide the volume-info output?

-Krutika



On Fri, Jun 30, 2017 at 4:23 PM, <***@gencgiyen.com <mailto:***@gencgiyen.com> > wrote:

Hi,



I have an 2 nodes with 20 bricks in total (10+10).



First test:



2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes



“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s



Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)



Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s



Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.



I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.



I tried tuning (cache, window-size etc..). Nothing helps.



GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.



Is there any tweak/tuning out there to make it fast?



Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow.



The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.



Thanks,

Gencer.

Loading...