Discussion:
Slow write times to gluster disk
Add Reply
Pat Haley
2017-04-07 18:37:06 UTC
Reply
Permalink
Raw Message
Hi,

We noticed a dramatic slowness when writing to a gluster disk when
compared to writing to an NFS disk. Specifically when using dd (data
duplicator) to write a 4.3 GB file of zeros:

* on NFS disk (/home): 9.5 Gb/s
* on gluster disk (/gdata): 508 Mb/s

The gluser disk is 2 bricks joined together, no replication or anything
else. The hardware is (literally) the same:

* one server with 70 hard disks and a hardware RAID card.
* 4 disks in a RAID-6 group (the NFS disk)
* 32 disks in a RAID-6 group (the max allowed by the card, /mnt/brick1)
* 32 disks in another RAID-6 group (/mnt/brick2)
* 2 hot spare

Some additional information and more tests results (after changing the
log level):

glusterfs 3.7.11 built on Apr 27 2016 14:09:22
CentOS release 6.8 (Final)
RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
[Invader] (rev 02)



*Create the file to /gdata (gluster)*
[***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*

*Create the file to /home (ext4)*
[***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times as fast*


Copy from /gdata to /gdata (gluster to gluster)
*[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy
slooowww


*Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy
slooowww again



*Copy from /home to /home (ext4 to ext4)*
[***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast


*Copy from /home to /home (ext4 to ext4)*
[***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as fast


As a test, can we copy data directly to the xfs mountpoint (/mnt/brick1)
and bypass gluster?


Any help you could give us would be appreciated.

Thanks

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Ravishankar N
2017-04-08 04:58:49 UTC
Reply
Permalink
Raw Message
Hi Pat,

I'm assuming you are using gluster native (fuse mount). If it helps, you
could try mounting it via gluster NFS (gnfs) and then see if there is an
improvement in speed. Fuse mounts are slower than gnfs mounts but you
get the benefit of avoiding a single point of failure. Unlike fuse
mounts, if the gluster node containing the gnfs server goes down, all
mounts done using that node will fail). For fuse mounts, you could try
tweaking the write-behind xlator settings to see if it helps. See the
performance.write-behind and performance.write-behind-window-size
options in `gluster volume set help`. Of course, even for gnfs mounts,
you can achieve fail-over by using CTDB.

Thanks,
Ravi

On 04/08/2017 12:07 AM, Pat Haley wrote:
>
> Hi,
>
> We noticed a dramatic slowness when writing to a gluster disk when
> compared to writing to an NFS disk. Specifically when using dd (data
> duplicator) to write a 4.3 GB file of zeros:
>
> * on NFS disk (/home): 9.5 Gb/s
> * on gluster disk (/gdata): 508 Mb/s
>
> The gluser disk is 2 bricks joined together, no replication or
> anything else. The hardware is (literally) the same:
>
> * one server with 70 hard disks and a hardware RAID card.
> * 4 disks in a RAID-6 group (the NFS disk)
> * 32 disks in a RAID-6 group (the max allowed by the card, /mnt/brick1)
> * 32 disks in another RAID-6 group (/mnt/brick2)
> * 2 hot spare
>
> Some additional information and more tests results (after changing the
> log level):
>
> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
> CentOS release 6.8 (Final)
> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
> [Invader] (rev 02)
>
>
>
> *Create the file to /gdata (gluster)*
> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>
> *Create the file to /home (ext4)*
> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times as
> fast*
>
>
> Copy from /gdata to /gdata (gluster to gluster)
> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy
> slooowww
>
>
> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy
> slooowww again
>
>
>
> *Copy from /home to /home (ext4 to ext4)*
> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast
>
>
> *Copy from /home to /home (ext4 to ext4)*
> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as fast
>
>
> As a test, can we copy data directly to the xfs mountpoint
> (/mnt/brick1) and bypass gluster?
>
>
> Any help you could give us would be appreciated.
>
> Thanks
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
Pat Haley
2017-04-10 19:12:45 UTC
Reply
Permalink
Raw Message
Hi Ravi,

Thanks for the reply. And yes, we are using the gluster native (fuse)
mount. Since this is not my area of expertise I have a few questions
(mostly clarifications)

Is a factor of 20 slow-down typical when compare a fuse-mounted
filesytem versus an NFS-mounted filesystem or should we also be looking
for additional issues? (Note the first dd test described below was run
on the server that hosts the file-systems so no network communication
was involved).

You also mention tweaking " write-behind xlator settings". Would you
expect better speed improvements from switching the mounting from fuse
to gnfs or from tweaking the settings? Also are these mutually
exclusive or would the be additional benefits from both switching to
gfns and tweaking?

My next question is to make sure I'm clear on the comment " if the
gluster node containing the gnfs server goes down, all mounts done using
that node will fail". If you have 2 servers, each 1 brick in the
over-all gluster FS, and one server fails, then for gnfs nothing on
either server is visible to other nodes while under fuse only the files
on the dead server are not visible. Is this what you meant?

Finally, you mention "even for gnfs mounts, you can achieve fail-over by
using CTDB". Do you know if CTDB would have any performance impact
(i.e. in a worst cast scenario could adding CTDB to gnfs erase the speed
benefits of going to gnfs in the first place)?

Thanks

Pat


On 04/08/2017 12:58 AM, Ravishankar N wrote:
> Hi Pat,
>
> I'm assuming you are using gluster native (fuse mount). If it helps,
> you could try mounting it via gluster NFS (gnfs) and then see if there
> is an improvement in speed. Fuse mounts are slower than gnfs mounts
> but you get the benefit of avoiding a single point of failure. Unlike
> fuse mounts, if the gluster node containing the gnfs server goes down,
> all mounts done using that node will fail). For fuse mounts, you could
> try tweaking the write-behind xlator settings to see if it helps. See
> the performance.write-behind and performance.write-behind-window-size
> options in `gluster volume set help`. Of course, even for gnfs mounts,
> you can achieve fail-over by using CTDB.
>
> Thanks,
> Ravi
>
> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>
>> Hi,
>>
>> We noticed a dramatic slowness when writing to a gluster disk when
>> compared to writing to an NFS disk. Specifically when using dd (data
>> duplicator) to write a 4.3 GB file of zeros:
>>
>> * on NFS disk (/home): 9.5 Gb/s
>> * on gluster disk (/gdata): 508 Mb/s
>>
>> The gluser disk is 2 bricks joined together, no replication or
>> anything else. The hardware is (literally) the same:
>>
>> * one server with 70 hard disks and a hardware RAID card.
>> * 4 disks in a RAID-6 group (the NFS disk)
>> * 32 disks in a RAID-6 group (the max allowed by the card, /mnt/brick1)
>> * 32 disks in another RAID-6 group (/mnt/brick2)
>> * 2 hot spare
>>
>> Some additional information and more tests results (after changing
>> the log level):
>>
>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>> CentOS release 6.8 (Final)
>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
>> [Invader] (rev 02)
>>
>>
>>
>> *Create the file to /gdata (gluster)*
>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>> count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>
>> *Create the file to /home (ext4)*
>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times as
>> fast*
>>
>>
>> Copy from /gdata to /gdata (gluster to gluster)
>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy
>> slooowww
>>
>>
>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy
>> slooowww again
>>
>>
>>
>> *Copy from /home to /home (ext4 to ext4)*
>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast
>>
>>
>> *Copy from /home to /home (ext4 to ext4)*
>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as fast
>>
>>
>> As a test, can we copy data directly to the xfs mountpoint
>> (/mnt/brick1) and bypass gluster?
>>
>>
>> Any help you could give us would be appreciated.
>>
>> Thanks
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-***@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Ravishankar N
2017-04-11 04:21:21 UTC
Reply
Permalink
Raw Message
On 04/11/2017 12:42 AM, Pat Haley wrote:
>
> Hi Ravi,
>
> Thanks for the reply. And yes, we are using the gluster native (fuse)
> mount. Since this is not my area of expertise I have a few questions
> (mostly clarifications)
>
> Is a factor of 20 slow-down typical when compare a fuse-mounted
> filesytem versus an NFS-mounted filesystem or should we also be
> looking for additional issues? (Note the first dd test described
> below was run on the server that hosts the file-systems so no network
> communication was involved).

Though both the gluster bricks and the mounts are on the same physical
machine in your setup, the I/O still passes through different layers of
kernel/user-space fuse stack although I don't know if 20x slow down on
gluster vs NFS share is normal. Why don't you try doing a gluster NFS
mount on the machine and try the dd test and compare it with the gluster
fuse mount results?

>
> You also mention tweaking " write-behind xlator settings". Would you
> expect better speed improvements from switching the mounting from fuse
> to gnfs or from tweaking the settings? Also are these mutually
> exclusive or would the be additional benefits from both switching to
> gfns and tweaking?
You should test these out and find the answers yourself. :-)

>
> My next question is to make sure I'm clear on the comment " if the
> gluster node containing the gnfs server goes down, all mounts done
> using that node will fail". If you have 2 servers, each 1 brick in
> the over-all gluster FS, and one server fails, then for gnfs nothing
> on either server is visible to other nodes while under fuse only the
> files on the dead server are not visible. Is this what you meant?
Yes, for gnfs mounts, all I/O from various mounts go to the gnfs server
process (on the machine whose IP was used at the time of mounting) which
then sends the I/O to the brick processes. For fuse, the gluster fuse
mount itself talks directly to the bricks.
>
> Finally, you mention "even for gnfs mounts, you can achieve fail-over
> by using CTDB". Do you know if CTDB would have any performance impact
> (i.e. in a worst cast scenario could adding CTDB to gnfs erase the
> speed benefits of going to gnfs in the first place)?
I don't think it would. You can even achieve load balancing via CTDB to
use different gnfs servers for different clients. But I don't know if
this is needed/ helpful in your current setup where everything (bricks
and clients) seem to be on just one server.

-Ravi
> Thanks
>
> Pat
>
>
> On 04/08/2017 12:58 AM, Ravishankar N wrote:
>> Hi Pat,
>>
>> I'm assuming you are using gluster native (fuse mount). If it helps,
>> you could try mounting it via gluster NFS (gnfs) and then see if
>> there is an improvement in speed. Fuse mounts are slower than gnfs
>> mounts but you get the benefit of avoiding a single point of failure.
>> Unlike fuse mounts, if the gluster node containing the gnfs server
>> goes down, all mounts done using that node will fail). For fuse
>> mounts, you could try tweaking the write-behind xlator settings to
>> see if it helps. See the performance.write-behind and
>> performance.write-behind-window-size options in `gluster volume set
>> help`. Of course, even for gnfs mounts, you can achieve fail-over by
>> using CTDB.
>>
>> Thanks,
>> Ravi
>>
>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>
>>> Hi,
>>>
>>> We noticed a dramatic slowness when writing to a gluster disk when
>>> compared to writing to an NFS disk. Specifically when using dd (data
>>> duplicator) to write a 4.3 GB file of zeros:
>>>
>>> * on NFS disk (/home): 9.5 Gb/s
>>> * on gluster disk (/gdata): 508 Mb/s
>>>
>>> The gluser disk is 2 bricks joined together, no replication or
>>> anything else. The hardware is (literally) the same:
>>>
>>> * one server with 70 hard disks and a hardware RAID card.
>>> * 4 disks in a RAID-6 group (the NFS disk)
>>> * 32 disks in a RAID-6 group (the max allowed by the card,
>>> /mnt/brick1)
>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>> * 2 hot spare
>>>
>>> Some additional information and more tests results (after changing
>>> the log level):
>>>
>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>> CentOS release 6.8 (Final)
>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
>>> [Invader] (rev 02)
>>>
>>>
>>>
>>> *Create the file to /gdata (gluster)*
>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>>> count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>
>>> *Create the file to /home (ext4)*
>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M
>>> count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times
>>> as fast*
>>>
>>>
>>> Copy from /gdata to /gdata (gluster to gluster)
>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy
>>> slooowww
>>>
>>>
>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy
>>> slooowww again
>>>
>>>
>>>
>>> *Copy from /home to /home (ext4 to ext4)*
>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast
>>>
>>>
>>> *Copy from /home to /home (ext4 to ext4)*
>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as
>>> fast
>>>
>>>
>>> As a test, can we copy data directly to the xfs mountpoint
>>> (/mnt/brick1) and bypass gluster?
>>>
>>>
>>> Any help you could give us would be appreciated.
>>>
>>> Thanks
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-***@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
Pat Haley
2017-04-13 22:18:30 UTC
Reply
Permalink
Raw Message
Hi Ravi (and list),

We are planning on testing the NFS route to see what kind of speed-up we
get. A little research led us to the following:

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/

Is this correct path to take to mount 2 xfs volumes as a single gluster
file system volume? If not, what would be a better path?


Pat



On 04/11/2017 12:21 AM, Ravishankar N wrote:
> On 04/11/2017 12:42 AM, Pat Haley wrote:
>>
>> Hi Ravi,
>>
>> Thanks for the reply. And yes, we are using the gluster native
>> (fuse) mount. Since this is not my area of expertise I have a few
>> questions (mostly clarifications)
>>
>> Is a factor of 20 slow-down typical when compare a fuse-mounted
>> filesytem versus an NFS-mounted filesystem or should we also be
>> looking for additional issues? (Note the first dd test described
>> below was run on the server that hosts the file-systems so no network
>> communication was involved).
>
> Though both the gluster bricks and the mounts are on the same physical
> machine in your setup, the I/O still passes through different layers
> of kernel/user-space fuse stack although I don't know if 20x slow down
> on gluster vs NFS share is normal. Why don't you try doing a gluster
> NFS mount on the machine and try the dd test and compare it with the
> gluster fuse mount results?
>
>>
>> You also mention tweaking " write-behind xlator settings". Would you
>> expect better speed improvements from switching the mounting from
>> fuse to gnfs or from tweaking the settings? Also are these mutually
>> exclusive or would the be additional benefits from both switching to
>> gfns and tweaking?
> You should test these out and find the answers yourself. :-)
>
>>
>> My next question is to make sure I'm clear on the comment " if the
>> gluster node containing the gnfs server goes down, all mounts done
>> using that node will fail". If you have 2 servers, each 1 brick in
>> the over-all gluster FS, and one server fails, then for gnfs nothing
>> on either server is visible to other nodes while under fuse only the
>> files on the dead server are not visible. Is this what you meant?
> Yes, for gnfs mounts, all I/O from various mounts go to the gnfs
> server process (on the machine whose IP was used at the time of
> mounting) which then sends the I/O to the brick processes. For fuse,
> the gluster fuse mount itself talks directly to the bricks.
>>
>> Finally, you mention "even for gnfs mounts, you can achieve fail-over
>> by using CTDB". Do you know if CTDB would have any performance
>> impact (i.e. in a worst cast scenario could adding CTDB to gnfs erase
>> the speed benefits of going to gnfs in the first place)?
> I don't think it would. You can even achieve load balancing via CTDB
> to use different gnfs servers for different clients. But I don't know
> if this is needed/ helpful in your current setup where everything
> (bricks and clients) seem to be on just one server.
>
> -Ravi
>> Thanks
>>
>> Pat
>>
>>
>> On 04/08/2017 12:58 AM, Ravishankar N wrote:
>>> Hi Pat,
>>>
>>> I'm assuming you are using gluster native (fuse mount). If it helps,
>>> you could try mounting it via gluster NFS (gnfs) and then see if
>>> there is an improvement in speed. Fuse mounts are slower than gnfs
>>> mounts but you get the benefit of avoiding a single point of
>>> failure. Unlike fuse mounts, if the gluster node containing the gnfs
>>> server goes down, all mounts done using that node will fail). For
>>> fuse mounts, you could try tweaking the write-behind xlator settings
>>> to see if it helps. See the performance.write-behind and
>>> performance.write-behind-window-size options in `gluster volume set
>>> help`. Of course, even for gnfs mounts, you can achieve fail-over by
>>> using CTDB.
>>>
>>> Thanks,
>>> Ravi
>>>
>>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>>
>>>> Hi,
>>>>
>>>> We noticed a dramatic slowness when writing to a gluster disk when
>>>> compared to writing to an NFS disk. Specifically when using dd
>>>> (data duplicator) to write a 4.3 GB file of zeros:
>>>>
>>>> * on NFS disk (/home): 9.5 Gb/s
>>>> * on gluster disk (/gdata): 508 Mb/s
>>>>
>>>> The gluser disk is 2 bricks joined together, no replication or
>>>> anything else. The hardware is (literally) the same:
>>>>
>>>> * one server with 70 hard disks and a hardware RAID card.
>>>> * 4 disks in a RAID-6 group (the NFS disk)
>>>> * 32 disks in a RAID-6 group (the max allowed by the card,
>>>> /mnt/brick1)
>>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>>> * 2 hot spare
>>>>
>>>> Some additional information and more tests results (after changing
>>>> the log level):
>>>>
>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>> CentOS release 6.8 (Final)
>>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
>>>> [Invader] (rev 02)
>>>>
>>>>
>>>>
>>>> *Create the file to /gdata (gluster)*
>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>>>> count=1000
>>>> 1000+0 records in
>>>> 1000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>>
>>>> *Create the file to /home (ext4)*
>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M
>>>> count=1000
>>>> 1000+0 records in
>>>> 1000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times
>>>> as fast*
>>>>
>>>>
>>>> Copy from /gdata to /gdata (gluster to gluster)
>>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* -
>>>> realllyyy slooowww
>>>>
>>>>
>>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* -
>>>> realllyyy slooowww again
>>>>
>>>>
>>>>
>>>> *Copy from /home to /home (ext4 to ext4)*
>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as
>>>> fast
>>>>
>>>>
>>>> *Copy from /home to /home (ext4 to ext4)*
>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times
>>>> as fast
>>>>
>>>>
>>>> As a test, can we copy data directly to the xfs mountpoint
>>>> (/mnt/brick1) and bypass gluster?
>>>>
>>>>
>>>> Any help you could give us would be appreciated.
>>>>
>>>> Thanks
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-***@gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>
>

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Ravishankar N
2017-04-14 04:57:22 UTC
Reply
Permalink
Raw Message
I'm not sure if the version you are running (glusterfs 3.7.11 ) works
with NFS-Ganesha as the link seems to suggest version >=3.8 as a
per-requisite. Adding Soumya for help. If it is not supported, then you
might have to go the plain glusterNFS way.
Regards,
Ravi

On 04/14/2017 03:48 AM, Pat Haley wrote:
>
> Hi Ravi (and list),
>
> We are planning on testing the NFS route to see what kind of speed-up
> we get. A little research led us to the following:
>
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>
> Is this correct path to take to mount 2 xfs volumes as a single
> gluster file system volume? If not, what would be a better path?
>
>
> Pat
>
>
>
> On 04/11/2017 12:21 AM, Ravishankar N wrote:
>> On 04/11/2017 12:42 AM, Pat Haley wrote:
>>>
>>> Hi Ravi,
>>>
>>> Thanks for the reply. And yes, we are using the gluster native
>>> (fuse) mount. Since this is not my area of expertise I have a few
>>> questions (mostly clarifications)
>>>
>>> Is a factor of 20 slow-down typical when compare a fuse-mounted
>>> filesytem versus an NFS-mounted filesystem or should we also be
>>> looking for additional issues? (Note the first dd test described
>>> below was run on the server that hosts the file-systems so no
>>> network communication was involved).
>>
>> Though both the gluster bricks and the mounts are on the same
>> physical machine in your setup, the I/O still passes through
>> different layers of kernel/user-space fuse stack although I don't
>> know if 20x slow down on gluster vs NFS share is normal. Why don't
>> you try doing a gluster NFS mount on the machine and try the dd test
>> and compare it with the gluster fuse mount results?
>>
>>>
>>> You also mention tweaking " write-behind xlator settings". Would you
>>> expect better speed improvements from switching the mounting from
>>> fuse to gnfs or from tweaking the settings? Also are these mutually
>>> exclusive or would the be additional benefits from both switching to
>>> gfns and tweaking?
>> You should test these out and find the answers yourself. :-)
>>
>>>
>>> My next question is to make sure I'm clear on the comment " if the
>>> gluster node containing the gnfs server goes down, all mounts done
>>> using that node will fail". If you have 2 servers, each 1 brick in
>>> the over-all gluster FS, and one server fails, then for gnfs nothing
>>> on either server is visible to other nodes while under fuse only the
>>> files on the dead server are not visible. Is this what you meant?
>> Yes, for gnfs mounts, all I/O from various mounts go to the gnfs
>> server process (on the machine whose IP was used at the time of
>> mounting) which then sends the I/O to the brick processes. For fuse,
>> the gluster fuse mount itself talks directly to the bricks.
>>>
>>> Finally, you mention "even for gnfs mounts, you can achieve
>>> fail-over by using CTDB". Do you know if CTDB would have any
>>> performance impact (i.e. in a worst cast scenario could adding CTDB
>>> to gnfs erase the speed benefits of going to gnfs in the first place)?
>> I don't think it would. You can even achieve load balancing via CTDB
>> to use different gnfs servers for different clients. But I don't know
>> if this is needed/ helpful in your current setup where everything
>> (bricks and clients) seem to be on just one server.
>>
>> -Ravi
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>> On 04/08/2017 12:58 AM, Ravishankar N wrote:
>>>> Hi Pat,
>>>>
>>>> I'm assuming you are using gluster native (fuse mount). If it
>>>> helps, you could try mounting it via gluster NFS (gnfs) and then
>>>> see if there is an improvement in speed. Fuse mounts are slower
>>>> than gnfs mounts but you get the benefit of avoiding a single point
>>>> of failure. Unlike fuse mounts, if the gluster node containing the
>>>> gnfs server goes down, all mounts done using that node will fail).
>>>> For fuse mounts, you could try tweaking the write-behind xlator
>>>> settings to see if it helps. See the performance.write-behind and
>>>> performance.write-behind-window-size options in `gluster volume set
>>>> help`. Of course, even for gnfs mounts, you can achieve fail-over
>>>> by using CTDB.
>>>>
>>>> Thanks,
>>>> Ravi
>>>>
>>>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We noticed a dramatic slowness when writing to a gluster disk when
>>>>> compared to writing to an NFS disk. Specifically when using dd
>>>>> (data duplicator) to write a 4.3 GB file of zeros:
>>>>>
>>>>> * on NFS disk (/home): 9.5 Gb/s
>>>>> * on gluster disk (/gdata): 508 Mb/s
>>>>>
>>>>> The gluser disk is 2 bricks joined together, no replication or
>>>>> anything else. The hardware is (literally) the same:
>>>>>
>>>>> * one server with 70 hard disks and a hardware RAID card.
>>>>> * 4 disks in a RAID-6 group (the NFS disk)
>>>>> * 32 disks in a RAID-6 group (the max allowed by the card,
>>>>> /mnt/brick1)
>>>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>>>> * 2 hot spare
>>>>>
>>>>> Some additional information and more tests results (after changing
>>>>> the log level):
>>>>>
>>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>>> CentOS release 6.8 (Final)
>>>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
>>>>> [Invader] (rev 02)
>>>>>
>>>>>
>>>>>
>>>>> *Create the file to /gdata (gluster)*
>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>>>>> count=1000
>>>>> 1000+0 records in
>>>>> 1000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>>>
>>>>> *Create the file to /home (ext4)*
>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M
>>>>> count=1000
>>>>> 1000+0 records in
>>>>> 1000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times
>>>>> as fast*
>>>>>
>>>>>
>>>>> Copy from /gdata to /gdata (gluster to gluster)
>>>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* -
>>>>> realllyyy slooowww
>>>>>
>>>>>
>>>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* -
>>>>> realllyyy slooowww again
>>>>>
>>>>>
>>>>>
>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times
>>>>> as fast
>>>>>
>>>>>
>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times
>>>>> as fast
>>>>>
>>>>>
>>>>> As a test, can we copy data directly to the xfs mountpoint
>>>>> (/mnt/brick1) and bypass gluster?
>>>>>
>>>>>
>>>>> Any help you could give us would be appreciated.
>>>>>
>>>>> Thanks
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-***@gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>
>>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
Soumya Koduri
2017-04-17 07:18:41 UTC
Reply
Permalink
Raw Message
On 04/14/2017 10:27 AM, Ravishankar N wrote:
> I'm not sure if the version you are running (glusterfs 3.7.11 ) works
> with NFS-Ganesha as the link seems to suggest version >=3.8 as a
> per-requisite. Adding Soumya for help. If it is not supported, then you
> might have to go the plain glusterNFS way.

Even gluster 3.7.x shall work with NFS-Ganesha but the steps to
configure had changed from 3.8 and hence the pre-requisite was added in
the doc. IIUC, from your below mail, you would like to try NFS
(preferably gNFS but not NFS-Ganesha) which may perform better compared
to fuse mount. In that case, gNFS server comes up by default (till
release-3.7.x) and there are additional steps needed to export volume
via gNFS. Let me know if you have any issues accessing volumes via gNFS.

Regards,
Soumya

> Regards,
> Ravi
>
> On 04/14/2017 03:48 AM, Pat Haley wrote:
>>
>> Hi Ravi (and list),
>>
>> We are planning on testing the NFS route to see what kind of speed-up
>> we get. A little research led us to the following:
>>
>> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>
>> Is this correct path to take to mount 2 xfs volumes as a single
>> gluster file system volume? If not, what would be a better path?
>>
>>
>> Pat
>>
>>
>>
>> On 04/11/2017 12:21 AM, Ravishankar N wrote:
>>> On 04/11/2017 12:42 AM, Pat Haley wrote:
>>>>
>>>> Hi Ravi,
>>>>
>>>> Thanks for the reply. And yes, we are using the gluster native
>>>> (fuse) mount. Since this is not my area of expertise I have a few
>>>> questions (mostly clarifications)
>>>>
>>>> Is a factor of 20 slow-down typical when compare a fuse-mounted
>>>> filesytem versus an NFS-mounted filesystem or should we also be
>>>> looking for additional issues? (Note the first dd test described
>>>> below was run on the server that hosts the file-systems so no
>>>> network communication was involved).
>>>
>>> Though both the gluster bricks and the mounts are on the same
>>> physical machine in your setup, the I/O still passes through
>>> different layers of kernel/user-space fuse stack although I don't
>>> know if 20x slow down on gluster vs NFS share is normal. Why don't
>>> you try doing a gluster NFS mount on the machine and try the dd test
>>> and compare it with the gluster fuse mount results?
>>>
>>>>
>>>> You also mention tweaking " write-behind xlator settings". Would
>>>> you expect better speed improvements from switching the mounting
>>>> from fuse to gnfs or from tweaking the settings? Also are these
>>>> mutually exclusive or would the be additional benefits from both
>>>> switching to gfns and tweaking?
>>> You should test these out and find the answers yourself. :-)
>>>
>>>>
>>>> My next question is to make sure I'm clear on the comment " if the
>>>> gluster node containing the gnfs server goes down, all mounts done
>>>> using that node will fail". If you have 2 servers, each 1 brick in
>>>> the over-all gluster FS, and one server fails, then for gnfs nothing
>>>> on either server is visible to other nodes while under fuse only the
>>>> files on the dead server are not visible. Is this what you meant?
>>> Yes, for gnfs mounts, all I/O from various mounts go to the gnfs
>>> server process (on the machine whose IP was used at the time of
>>> mounting) which then sends the I/O to the brick processes. For fuse,
>>> the gluster fuse mount itself talks directly to the bricks.
>>>>
>>>> Finally, you mention "even for gnfs mounts, you can achieve
>>>> fail-over by using CTDB". Do you know if CTDB would have any
>>>> performance impact (i.e. in a worst cast scenario could adding CTDB
>>>> to gnfs erase the speed benefits of going to gnfs in the first place)?
>>> I don't think it would. You can even achieve load balancing via CTDB
>>> to use different gnfs servers for different clients. But I don't know
>>> if this is needed/ helpful in your current setup where everything
>>> (bricks and clients) seem to be on just one server.
>>>
>>> -Ravi
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>> On 04/08/2017 12:58 AM, Ravishankar N wrote:
>>>>> Hi Pat,
>>>>>
>>>>> I'm assuming you are using gluster native (fuse mount). If it
>>>>> helps, you could try mounting it via gluster NFS (gnfs) and then
>>>>> see if there is an improvement in speed. Fuse mounts are slower
>>>>> than gnfs mounts but you get the benefit of avoiding a single point
>>>>> of failure. Unlike fuse mounts, if the gluster node containing the
>>>>> gnfs server goes down, all mounts done using that node will fail).
>>>>> For fuse mounts, you could try tweaking the write-behind xlator
>>>>> settings to see if it helps. See the performance.write-behind and
>>>>> performance.write-behind-window-size options in `gluster volume set
>>>>> help`. Of course, even for gnfs mounts, you can achieve fail-over
>>>>> by using CTDB.
>>>>>
>>>>> Thanks,
>>>>> Ravi
>>>>>
>>>>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We noticed a dramatic slowness when writing to a gluster disk when
>>>>>> compared to writing to an NFS disk. Specifically when using dd
>>>>>> (data duplicator) to write a 4.3 GB file of zeros:
>>>>>>
>>>>>> * on NFS disk (/home): 9.5 Gb/s
>>>>>> * on gluster disk (/gdata): 508 Mb/s
>>>>>>
>>>>>> The gluser disk is 2 bricks joined together, no replication or
>>>>>> anything else. The hardware is (literally) the same:
>>>>>>
>>>>>> * one server with 70 hard disks and a hardware RAID card.
>>>>>> * 4 disks in a RAID-6 group (the NFS disk)
>>>>>> * 32 disks in a RAID-6 group (the max allowed by the card,
>>>>>> /mnt/brick1)
>>>>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>>>>> * 2 hot spare
>>>>>>
>>>>>> Some additional information and more tests results (after changing
>>>>>> the log level):
>>>>>>
>>>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>>>> CentOS release 6.8 (Final)
>>>>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
>>>>>> [Invader] (rev 02)
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Create the file to /gdata (gluster)*
>>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>>>>>> count=1000
>>>>>> 1000+0 records in
>>>>>> 1000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>>>>
>>>>>> *Create the file to /home (ext4)*
>>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M
>>>>>> count=1000
>>>>>> 1000+0 records in
>>>>>> 1000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times
>>>>>> as fast*
>>>>>>
>>>>>>
>>>>>> Copy from /gdata to /gdata (gluster to gluster)
>>>>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* -
>>>>>> realllyyy slooowww
>>>>>>
>>>>>>
>>>>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>>>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* -
>>>>>> realllyyy slooowww again
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times
>>>>>> as fast
>>>>>>
>>>>>>
>>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times
>>>>>> as fast
>>>>>>
>>>>>>
>>>>>> As a test, can we copy data directly to the xfs mountpoint
>>>>>> (/mnt/brick1) and bypass gluster?
>>>>>>
>>>>>>
>>>>>> Any help you could give us would be appreciated.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-***@gluster.org
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>
>>>
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>
>
Pranith Kumar Karampuri
2017-04-14 06:50:54 UTC
Reply
Permalink
Raw Message
On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N <***@redhat.com>
wrote:

> Hi Pat,
>
> I'm assuming you are using gluster native (fuse mount). If it helps, you
> could try mounting it via gluster NFS (gnfs) and then see if there is an
> improvement in speed. Fuse mounts are slower than gnfs mounts but you get
> the benefit of avoiding a single point of failure. Unlike fuse mounts, if
> the gluster node containing the gnfs server goes down, all mounts done
> using that node will fail). For fuse mounts, you could try tweaking the
> write-behind xlator settings to see if it helps. See the
> performance.write-behind and performance.write-behind-window-size options
> in `gluster volume set help`. Of course, even for gnfs mounts, you can
> achieve fail-over by using CTDB.
>

Ravi,
Do you have any data that suggests fuse mounts are slower than gNFS
servers?

Pat,
I see that I am late to the thread, but do you happen to have
"profile info" of the workload?

You can follow
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
to get the information.


>
> Thanks,
> Ravi
>
>
> On 04/08/2017 12:07 AM, Pat Haley wrote:
>
>
> Hi,
>
> We noticed a dramatic slowness when writing to a gluster disk when
> compared to writing to an NFS disk. Specifically when using dd (data
> duplicator) to write a 4.3 GB file of zeros:
>
> - on NFS disk (/home): 9.5 Gb/s
> - on gluster disk (/gdata): 508 Mb/s
>
> The gluser disk is 2 bricks joined together, no replication or anything
> else. The hardware is (literally) the same:
>
> - one server with 70 hard disks and a hardware RAID card.
> - 4 disks in a RAID-6 group (the NFS disk)
> - 32 disks in a RAID-6 group (the max allowed by the card, /mnt/brick1)
> - 32 disks in another RAID-6 group (/mnt/brick2)
> - 2 hot spare
>
> Some additional information and more tests results (after changing the log
> level):
>
> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
> CentOS release 6.8 (Final)
> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
> [Invader] (rev 02)
>
>
>
> *Create the file to /gdata (gluster)*
> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>
> *Create the file to /home (ext4)*
> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times as fast
>
>
>
> * Copy from /gdata to /gdata (gluster to gluster) *[***@mseas-data2
> gdata]# dd if=/gdata/zero1 of=/gdata/zero2
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy
> slooowww
>
>
> *Copy from /gdata to /gdata* *2nd time (gluster to gluster)*
> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy
> slooowww again
>
>
>
> *Copy from /home to /home (ext4 to ext4)*
> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast
>
>
> *Copy from /home to /home (ext4 to ext4)*
> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as fast
>
>
> As a test, can we copy data directly to the xfs mountpoint (/mnt/brick1)
> and bypass gluster?
>
>
> Any help you could give us would be appreciated.
>
> Thanks
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
> _______________________________________________
> Gluster-users mailing listGluster-***@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



--
Pranith
Ravishankar N
2017-04-14 07:01:41 UTC
Reply
Permalink
Raw Message
On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:
>
>
> On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N <***@redhat.com
> <mailto:***@redhat.com>> wrote:
>
> Hi Pat,
>
> I'm assuming you are using gluster native (fuse mount). If it
> helps, you could try mounting it via gluster NFS (gnfs) and then
> see if there is an improvement in speed. Fuse mounts are slower
> than gnfs mounts but you get the benefit of avoiding a single
> point of failure. Unlike fuse mounts, if the gluster node
> containing the gnfs server goes down, all mounts done using that
> node will fail). For fuse mounts, you could try tweaking the
> write-behind xlator settings to see if it helps. See the
> performance.write-behind and performance.write-behind-window-size
> options in `gluster volume set help`. Of course, even for gnfs
> mounts, you can achieve fail-over by using CTDB.
>
>
> Ravi,
> Do you have any data that suggests fuse mounts are slower than
> gNFS servers?
I have heard anecdotal evidence time and again on the ML and IRC, which
is why I wanted to compare it with NFS numbers on his setup.
>
> Pat,
> I see that I am late to the thread, but do you happen to have
> "profile info" of the workload?
>
> You can follow
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
> to get the information.
Yeah, Let's see if profile info shows up anything interesting.
-Ravi
>
>
> Thanks,
> Ravi
>
>
> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>
>> Hi,
>>
>> We noticed a dramatic slowness when writing to a gluster disk
>> when compared to writing to an NFS disk. Specifically when using
>> dd (data duplicator) to write a 4.3 GB file of zeros:
>>
>> * on NFS disk (/home): 9.5 Gb/s
>> * on gluster disk (/gdata): 508 Mb/s
>>
>> The gluser disk is 2 bricks joined together, no replication or
>> anything else. The hardware is (literally) the same:
>>
>> * one server with 70 hard disks and a hardware RAID card.
>> * 4 disks in a RAID-6 group (the NFS disk)
>> * 32 disks in a RAID-6 group (the max allowed by the card,
>> /mnt/brick1)
>> * 32 disks in another RAID-6 group (/mnt/brick2)
>> * 2 hot spare
>>
>> Some additional information and more tests results (after
>> changing the log level):
>>
>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>> CentOS release 6.8 (Final)
>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3
>> 3108 [Invader] (rev 02)
>>
>>
>>
>> *Create the file to /gdata (gluster)*
>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>> count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>
>> *Create the file to /home (ext4)*
>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M
>> count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3
>> times as fast*
>>
>>
>> Copy from /gdata to /gdata (gluster to gluster)
>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* -
>> realllyyy slooowww
>>
>>
>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* -
>> realllyyy slooowww again
>>
>>
>>
>> *Copy from /home to /home (ext4 to ext4)*
>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times
>> as fast
>>
>>
>> *Copy from /home to /home (ext4 to ext4)*
>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times
>> as fast
>>
>>
>> As a test, can we copy data directly to the xfs mountpoint
>> (/mnt/brick1) and bypass gluster?
>>
>>
>> Any help you could give us would be appreciated.
>>
>> Thanks
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-***@gluster.org <mailto:Gluster-***@gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> _______________________________________________ Gluster-users
> mailing list Gluster-***@gluster.org
> <mailto:Gluster-***@gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> --
> Pranith
Pat Haley
2017-05-05 14:44:30 UTC
Reply
Permalink
Raw Message
Hi Pranith & Ravi,

A couple of quick questions

We have profile turned on. Are there specific queries we should make
that would help debug our configuration? (The default profile info was
previously sent in
http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html
but I'm not sure if that is what you were looking for.)

We also started to do a test on serving gluster over NFS. We
rediscovered an issue we previously reported (
http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html
) in that the NFS mounted version was ignoring the group write
permissions. What specific information would be useful in debugging this?

Thanks

Pat


On 04/14/2017 03:01 AM, Ravishankar N wrote:
> On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N
>> <***@redhat.com <mailto:***@redhat.com>> wrote:
>>
>> Hi Pat,
>>
>> I'm assuming you are using gluster native (fuse mount). If it
>> helps, you could try mounting it via gluster NFS (gnfs) and then
>> see if there is an improvement in speed. Fuse mounts are slower
>> than gnfs mounts but you get the benefit of avoiding a single
>> point of failure. Unlike fuse mounts, if the gluster node
>> containing the gnfs server goes down, all mounts done using that
>> node will fail). For fuse mounts, you could try tweaking the
>> write-behind xlator settings to see if it helps. See the
>> performance.write-behind and performance.write-behind-window-size
>> options in `gluster volume set help`. Of course, even for gnfs
>> mounts, you can achieve fail-over by using CTDB.
>>
>>
>> Ravi,
>> Do you have any data that suggests fuse mounts are slower than
>> gNFS servers?
> I have heard anecdotal evidence time and again on the ML and IRC,
> which is why I wanted to compare it with NFS numbers on his setup.
>>
>> Pat,
>> I see that I am late to the thread, but do you happen to have
>> "profile info" of the workload?
>>
>> You can follow
>> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
>> to get the information.
> Yeah, Let's see if profile info shows up anything interesting.
> -Ravi
>>
>>
>> Thanks,
>> Ravi
>>
>>
>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>
>>> Hi,
>>>
>>> We noticed a dramatic slowness when writing to a gluster disk
>>> when compared to writing to an NFS disk. Specifically when using
>>> dd (data duplicator) to write a 4.3 GB file of zeros:
>>>
>>> * on NFS disk (/home): 9.5 Gb/s
>>> * on gluster disk (/gdata): 508 Mb/s
>>>
>>> The gluser disk is 2 bricks joined together, no replication or
>>> anything else. The hardware is (literally) the same:
>>>
>>> * one server with 70 hard disks and a hardware RAID card.
>>> * 4 disks in a RAID-6 group (the NFS disk)
>>> * 32 disks in a RAID-6 group (the max allowed by the card,
>>> /mnt/brick1)
>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>> * 2 hot spare
>>>
>>> Some additional information and more tests results (after
>>> changing the log level):
>>>
>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>> CentOS release 6.8 (Final)
>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3
>>> 3108 [Invader] (rev 02)
>>>
>>>
>>>
>>> *Create the file to /gdata (gluster)*
>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>>> count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>
>>> *Create the file to /home (ext4)*
>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M
>>> count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3
>>> times as fast*
>>>
>>>
>>> Copy from /gdata to /gdata (gluster to gluster)
>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* -
>>> realllyyy slooowww
>>>
>>>
>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* -
>>> realllyyy slooowww again
>>>
>>>
>>>
>>> *Copy from /home to /home (ext4 to ext4)*
>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times
>>> as fast
>>>
>>>
>>> *Copy from /home to /home (ext4 to ext4)*
>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30
>>> times as fast
>>>
>>>
>>> As a test, can we copy data directly to the xfs mountpoint
>>> (/mnt/brick1) and bypass gluster?
>>>
>>>
>>> Any help you could give us would be appreciated.
>>>
>>> Thanks
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-***@gluster.org <mailto:Gluster-***@gluster.org>
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>> _______________________________________________ Gluster-users
>> mailing list Gluster-***@gluster.org
>> <mailto:Gluster-***@gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>> --
>> Pranith
>
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-05 14:58:21 UTC
Reply
Permalink
Raw Message
hi Pat,
Let us concentrate on the performance numbers part for now. We will
look at the permissions one after this?

As per the profile info, only 2.6% of the work-load is writes. There are
too many Lookups.

Would it be possible to get the data for just the dd test you were doing
earlier?


On Fri, May 5, 2017 at 8:14 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith & Ravi,
>
> A couple of quick questions
>
> We have profile turned on. Are there specific queries we should make that
> would help debug our configuration? (The default profile info was
> previously sent in http://lists.gluster.org/pipermail/gluster-users/2017-
> May/030840.html but I'm not sure if that is what you were looking for.)
>
> We also started to do a test on serving gluster over NFS. We rediscovered
> an issue we previously reported ( http://lists.gluster.org/
> pipermail/gluster-users/2016-September/028289.html ) in that the NFS
> mounted version was ignoring the group write permissions. What specific
> information would be useful in debugging this?
>
> Thanks
>
> Pat
>
>
>
> On 04/14/2017 03:01 AM, Ravishankar N wrote:
>
> On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:
>
>
>
> On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N <***@redhat.com>
> wrote:
>
>> Hi Pat,
>>
>> I'm assuming you are using gluster native (fuse mount). If it helps, you
>> could try mounting it via gluster NFS (gnfs) and then see if there is an
>> improvement in speed. Fuse mounts are slower than gnfs mounts but you get
>> the benefit of avoiding a single point of failure. Unlike fuse mounts, if
>> the gluster node containing the gnfs server goes down, all mounts done
>> using that node will fail). For fuse mounts, you could try tweaking the
>> write-behind xlator settings to see if it helps. See the
>> performance.write-behind and performance.write-behind-window-size
>> options in `gluster volume set help`. Of course, even for gnfs mounts, you
>> can achieve fail-over by using CTDB.
>>
>
> Ravi,
> Do you have any data that suggests fuse mounts are slower than gNFS
> servers?
>
> I have heard anecdotal evidence time and again on the ML and IRC, which is
> why I wanted to compare it with NFS numbers on his setup.
>
>
> Pat,
> I see that I am late to the thread, but do you happen to have
> "profile info" of the workload?
>
> You can follow https://gluster.readthedocs.io/en/latest/Administrator%
> 20Guide/Monitoring%20Workload/ to get the information.
>
> Yeah, Let's see if profile info shows up anything interesting.
> -Ravi
>
>
>
>>
>> Thanks,
>> Ravi
>>
>>
>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>
>>
>> Hi,
>>
>> We noticed a dramatic slowness when writing to a gluster disk when
>> compared to writing to an NFS disk. Specifically when using dd (data
>> duplicator) to write a 4.3 GB file of zeros:
>>
>> - on NFS disk (/home): 9.5 Gb/s
>> - on gluster disk (/gdata): 508 Mb/s
>>
>> The gluser disk is 2 bricks joined together, no replication or anything
>> else. The hardware is (literally) the same:
>>
>> - one server with 70 hard disks and a hardware RAID card.
>> - 4 disks in a RAID-6 group (the NFS disk)
>> - 32 disks in a RAID-6 group (the max allowed by the card,
>> /mnt/brick1)
>> - 32 disks in another RAID-6 group (/mnt/brick2)
>> - 2 hot spare
>>
>> Some additional information and more tests results (after changing the
>> log level):
>>
>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>> CentOS release 6.8 (Final)
>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
>> [Invader] (rev 02)
>>
>>
>>
>> *Create the file to /gdata (gluster)*
>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M
>> count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>
>> *Create the file to /home (ext4)*
>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times as
>> fast
>>
>>
>>
>> * Copy from /gdata to /gdata (gluster to gluster) *[***@mseas-data2
>> gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy
>> slooowww
>>
>>
>> *Copy from /gdata to /gdata* *2nd time (gluster to gluster)*
>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy
>> slooowww again
>>
>>
>>
>> *Copy from /home to /home (ext4 to ext4)*
>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast
>>
>>
>> *Copy from /home to /home (ext4 to ext4)*
>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>> 2048000+0 records in
>> 2048000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as fast
>>
>>
>> As a test, can we copy data directly to the xfs mountpoint (/mnt/brick1)
>> and bypass gluster?
>>
>>
>> Any help you could give us would be appreciated.
>>
>> Thanks
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-***@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________ Gluster-users mailing
>> list Gluster-***@gluster.org http://lists.gluster.org/mailm
>> an/listinfo/gluster-users
>
> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-05 15:12:31 UTC
Reply
Permalink
Raw Message
Hi Pranith,

I presume you are asking for some version of the profile data that just
shows the dd test (or a repeat of the dd test). If yes, how do I
extract just that data?

Thanks

Pat



On 05/05/2017 10:58 AM, Pranith Kumar Karampuri wrote:
> hi Pat,
> Let us concentrate on the performance numbers part for now. We
> will look at the permissions one after this?
>
> As per the profile info, only 2.6% of the work-load is writes. There
> are too many Lookups.
>
> Would it be possible to get the data for just the dd test you were
> doing earlier?
>
>
> On Fri, May 5, 2017 at 8:14 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith & Ravi,
>
> A couple of quick questions
>
> We have profile turned on. Are there specific queries we should
> make that would help debug our configuration? (The default
> profile info was previously sent in
> http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html
> <http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html>
> but I'm not sure if that is what you were looking for.)
>
> We also started to do a test on serving gluster over NFS. We
> rediscovered an issue we previously reported (
> http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html
> <http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html>
> ) in that the NFS mounted version was ignoring the group write
> permissions. What specific information would be useful in
> debugging this?
>
> Thanks
>
> Pat
>
>
>
> On 04/14/2017 03:01 AM, Ravishankar N wrote:
>> On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N
>>> <***@redhat.com <mailto:***@redhat.com>> wrote:
>>>
>>> Hi Pat,
>>>
>>> I'm assuming you are using gluster native (fuse mount). If
>>> it helps, you could try mounting it via gluster NFS (gnfs)
>>> and then see if there is an improvement in speed. Fuse
>>> mounts are slower than gnfs mounts but you get the benefit
>>> of avoiding a single point of failure. Unlike fuse mounts,
>>> if the gluster node containing the gnfs server goes down,
>>> all mounts done using that node will fail). For fuse mounts,
>>> you could try tweaking the write-behind xlator settings to
>>> see if it helps. See the performance.write-behind and
>>> performance.write-behind-window-size options in `gluster
>>> volume set help`. Of course, even for gnfs mounts, you can
>>> achieve fail-over by using CTDB.
>>>
>>>
>>> Ravi,
>>> Do you have any data that suggests fuse mounts are slower
>>> than gNFS servers?
>> I have heard anecdotal evidence time and again on the ML and IRC,
>> which is why I wanted to compare it with NFS numbers on his setup.
>>>
>>> Pat,
>>> I see that I am late to the thread, but do you happen to
>>> have "profile info" of the workload?
>>>
>>> You can follow
>>> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
>>> <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/>
>>> to get the information.
>> Yeah, Let's see if profile info shows up anything interesting.
>> -Ravi
>>>
>>>
>>> Thanks,
>>> Ravi
>>>
>>>
>>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>>
>>>> Hi,
>>>>
>>>> We noticed a dramatic slowness when writing to a gluster
>>>> disk when compared to writing to an NFS disk. Specifically
>>>> when using dd (data duplicator) to write a 4.3 GB file of
>>>> zeros:
>>>>
>>>> * on NFS disk (/home): 9.5 Gb/s
>>>> * on gluster disk (/gdata): 508 Mb/s
>>>>
>>>> The gluser disk is 2 bricks joined together, no replication
>>>> or anything else. The hardware is (literally) the same:
>>>>
>>>> * one server with 70 hard disks and a hardware RAID card.
>>>> * 4 disks in a RAID-6 group (the NFS disk)
>>>> * 32 disks in a RAID-6 group (the max allowed by the
>>>> card, /mnt/brick1)
>>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>>> * 2 hot spare
>>>>
>>>> Some additional information and more tests results (after
>>>> changing the log level):
>>>>
>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>> CentOS release 6.8 (Final)
>>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID
>>>> SAS-3 3108 [Invader] (rev 02)
>>>>
>>>>
>>>>
>>>> *Create the file to /gdata (gluster)*
>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1
>>>> bs=1M count=1000
>>>> 1000+0 records in
>>>> 1000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>>
>>>> *Create the file to /home (ext4)*
>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1
>>>> bs=1M count=1000
>>>> 1000+0 records in
>>>> 1000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s -
>>>> *3 times as fast*
>>>>
>>>>
>>>> Copy from /gdata to /gdata (gluster to gluster)
>>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* -
>>>> realllyyy slooowww
>>>>
>>>>
>>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* -
>>>> realllyyy slooowww again
>>>>
>>>>
>>>>
>>>> *Copy from /home to /home (ext4 to ext4)*
>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30
>>>> times as fast
>>>>
>>>>
>>>> *Copy from /home to /home (ext4 to ext4)*
>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>>> 2048000+0 records in
>>>> 2048000+0 records out
>>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30
>>>> times as fast
>>>>
>>>>
>>>> As a test, can we copy data directly to the xfs mountpoint
>>>> (/mnt/brick1) and bypass gluster?
>>>>
>>>>
>>>> Any help you could give us would be appreciated.
>>>>
>>>> Thanks
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-***@gluster.org <mailto:Gluster-***@gluster.org>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list Gluster-***@gluster.org
>>> <mailto:Gluster-***@gluster.org>
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>> --
>>> Pranith
>>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Ravishankar N
2017-05-05 16:47:23 UTC
Reply
Permalink
Raw Message
On 05/05/2017 08:42 PM, Pat Haley wrote:
>
> Hi Pranith,
>
> I presume you are asking for some version of the profile data that
> just shows the dd test (or a repeat of the dd test). If yes, how do I
> extract just that data?
Yes, that is what he is asking for. Just clear the existing profile info
using `gluster volume profile volname clear` and run the dd test once.
Then when you run profile info again, it should just give you the stats
for the dd test.
>
> Thanks
>
> Pat
>
>
>
> On 05/05/2017 10:58 AM, Pranith Kumar Karampuri wrote:
>> hi Pat,
>> Let us concentrate on the performance numbers part for now. We
>> will look at the permissions one after this?
>>
>> As per the profile info, only 2.6% of the work-load is writes. There
>> are too many Lookups.
>>
>> Would it be possible to get the data for just the dd test you were
>> doing earlier?
>>
>>
>> On Fri, May 5, 2017 at 8:14 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith & Ravi,
>>
>> A couple of quick questions
>>
>> We have profile turned on. Are there specific queries we should
>> make that would help debug our configuration? (The default
>> profile info was previously sent in
>> http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html
>> <http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html>
>> but I'm not sure if that is what you were looking for.)
>>
>> We also started to do a test on serving gluster over NFS. We
>> rediscovered an issue we previously reported (
>> http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html
>> <http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html>
>> ) in that the NFS mounted version was ignoring the group write
>> permissions. What specific information would be useful in
>> debugging this?
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 04/14/2017 03:01 AM, Ravishankar N wrote:
>>> On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N
>>>> <***@redhat.com <mailto:***@redhat.com>> wrote:
>>>>
>>>> Hi Pat,
>>>>
>>>> I'm assuming you are using gluster native (fuse mount). If
>>>> it helps, you could try mounting it via gluster NFS (gnfs)
>>>> and then see if there is an improvement in speed. Fuse
>>>> mounts are slower than gnfs mounts but you get the benefit
>>>> of avoiding a single point of failure. Unlike fuse mounts,
>>>> if the gluster node containing the gnfs server goes down,
>>>> all mounts done using that node will fail). For fuse
>>>> mounts, you could try tweaking the write-behind xlator
>>>> settings to see if it helps. See the
>>>> performance.write-behind and
>>>> performance.write-behind-window-size options in `gluster
>>>> volume set help`. Of course, even for gnfs mounts, you can
>>>> achieve fail-over by using CTDB.
>>>>
>>>>
>>>> Ravi,
>>>> Do you have any data that suggests fuse mounts are slower
>>>> than gNFS servers?
>>> I have heard anecdotal evidence time and again on the ML and
>>> IRC, which is why I wanted to compare it with NFS numbers on his
>>> setup.
>>>>
>>>> Pat,
>>>> I see that I am late to the thread, but do you happen to
>>>> have "profile info" of the workload?
>>>>
>>>> You can follow
>>>> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
>>>> <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/>
>>>> to get the information.
>>> Yeah, Let's see if profile info shows up anything interesting.
>>> -Ravi
>>>>
>>>>
>>>> Thanks,
>>>> Ravi
>>>>
>>>>
>>>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We noticed a dramatic slowness when writing to a gluster
>>>>> disk when compared to writing to an NFS disk. Specifically
>>>>> when using dd (data duplicator) to write a 4.3 GB file of
>>>>> zeros:
>>>>>
>>>>> * on NFS disk (/home): 9.5 Gb/s
>>>>> * on gluster disk (/gdata): 508 Mb/s
>>>>>
>>>>> The gluser disk is 2 bricks joined together, no
>>>>> replication or anything else. The hardware is (literally)
>>>>> the same:
>>>>>
>>>>> * one server with 70 hard disks and a hardware RAID card.
>>>>> * 4 disks in a RAID-6 group (the NFS disk)
>>>>> * 32 disks in a RAID-6 group (the max allowed by the
>>>>> card, /mnt/brick1)
>>>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>>>> * 2 hot spare
>>>>>
>>>>> Some additional information and more tests results (after
>>>>> changing the log level):
>>>>>
>>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>>> CentOS release 6.8 (Final)
>>>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID
>>>>> SAS-3 3108 [Invader] (rev 02)
>>>>>
>>>>>
>>>>>
>>>>> *Create the file to /gdata (gluster)*
>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1
>>>>> bs=1M count=1000
>>>>> 1000+0 records in
>>>>> 1000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>>>
>>>>> *Create the file to /home (ext4)*
>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1
>>>>> bs=1M count=1000
>>>>> 1000+0 records in
>>>>> 1000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s -
>>>>> *3 times as fast*
>>>>>
>>>>>
>>>>> Copy from /gdata to /gdata (gluster to gluster)
>>>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* -
>>>>> realllyyy slooowww
>>>>>
>>>>>
>>>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* -
>>>>> realllyyy slooowww again
>>>>>
>>>>>
>>>>>
>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30
>>>>> times as fast
>>>>>
>>>>>
>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>>>> 2048000+0 records in
>>>>> 2048000+0 records out
>>>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* -
>>>>> 30 times as fast
>>>>>
>>>>>
>>>>> As a test, can we copy data directly to the xfs mountpoint
>>>>> (/mnt/brick1) and bypass gluster?
>>>>>
>>>>>
>>>>> Any help you could give us would be appreciated.
>>>>>
>>>>> Thanks
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-***@gluster.org <mailto:Gluster-***@gluster.org>
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list Gluster-***@gluster.org
>>>> <mailto:Gluster-***@gluster.org>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>
>>>> --
>>>> Pranith
>>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
Pat Haley
2017-05-06 00:11:26 UTC
Reply
Permalink
Raw Message
Hi,

We redid the dd tests (this time using conv=sync oflag=sync to avoid
caching questions). The profile results are in

http://mseas.mit.edu/download/phaley/GlusterUsers/profile_gluster_fuse_test


On 05/05/2017 12:47 PM, Ravishankar N wrote:
> On 05/05/2017 08:42 PM, Pat Haley wrote:
>>
>> Hi Pranith,
>>
>> I presume you are asking for some version of the profile data that
>> just shows the dd test (or a repeat of the dd test). If yes, how do
>> I extract just that data?
> Yes, that is what he is asking for. Just clear the existing profile
> info using `gluster volume profile volname clear` and run the dd test
> once. Then when you run profile info again, it should just give you
> the stats for the dd test.
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/05/2017 10:58 AM, Pranith Kumar Karampuri wrote:
>>> hi Pat,
>>> Let us concentrate on the performance numbers part for now. We
>>> will look at the permissions one after this?
>>>
>>> As per the profile info, only 2.6% of the work-load is writes. There
>>> are too many Lookups.
>>>
>>> Would it be possible to get the data for just the dd test you were
>>> doing earlier?
>>>
>>>
>>> On Fri, May 5, 2017 at 8:14 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith & Ravi,
>>>
>>> A couple of quick questions
>>>
>>> We have profile turned on. Are there specific queries we should
>>> make that would help debug our configuration? (The default
>>> profile info was previously sent in
>>> http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html
>>> <http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html>
>>> but I'm not sure if that is what you were looking for.)
>>>
>>> We also started to do a test on serving gluster over NFS. We
>>> rediscovered an issue we previously reported (
>>> http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html
>>> <http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html>
>>> ) in that the NFS mounted version was ignoring the group write
>>> permissions. What specific information would be useful in
>>> debugging this?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 04/14/2017 03:01 AM, Ravishankar N wrote:
>>>> On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N
>>>>> <***@redhat.com <mailto:***@redhat.com>> wrote:
>>>>>
>>>>> Hi Pat,
>>>>>
>>>>> I'm assuming you are using gluster native (fuse mount). If
>>>>> it helps, you could try mounting it via gluster NFS (gnfs)
>>>>> and then see if there is an improvement in speed. Fuse
>>>>> mounts are slower than gnfs mounts but you get the benefit
>>>>> of avoiding a single point of failure. Unlike fuse mounts,
>>>>> if the gluster node containing the gnfs server goes down,
>>>>> all mounts done using that node will fail). For fuse
>>>>> mounts, you could try tweaking the write-behind xlator
>>>>> settings to see if it helps. See the
>>>>> performance.write-behind and
>>>>> performance.write-behind-window-size options in `gluster
>>>>> volume set help`. Of course, even for gnfs mounts, you can
>>>>> achieve fail-over by using CTDB.
>>>>>
>>>>>
>>>>> Ravi,
>>>>> Do you have any data that suggests fuse mounts are
>>>>> slower than gNFS servers?
>>>> I have heard anecdotal evidence time and again on the ML and
>>>> IRC, which is why I wanted to compare it with NFS numbers on
>>>> his setup.
>>>>>
>>>>> Pat,
>>>>> I see that I am late to the thread, but do you happen to
>>>>> have "profile info" of the workload?
>>>>>
>>>>> You can follow
>>>>> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
>>>>> <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/>
>>>>> to get the information.
>>>> Yeah, Let's see if profile info shows up anything interesting.
>>>> -Ravi
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Ravi
>>>>>
>>>>>
>>>>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We noticed a dramatic slowness when writing to a gluster
>>>>>> disk when compared to writing to an NFS disk.
>>>>>> Specifically when using dd (data duplicator) to write a
>>>>>> 4.3 GB file of zeros:
>>>>>>
>>>>>> * on NFS disk (/home): 9.5 Gb/s
>>>>>> * on gluster disk (/gdata): 508 Mb/s
>>>>>>
>>>>>> The gluser disk is 2 bricks joined together, no
>>>>>> replication or anything else. The hardware is (literally)
>>>>>> the same:
>>>>>>
>>>>>> * one server with 70 hard disks and a hardware RAID card.
>>>>>> * 4 disks in a RAID-6 group (the NFS disk)
>>>>>> * 32 disks in a RAID-6 group (the max allowed by the
>>>>>> card, /mnt/brick1)
>>>>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>>>>> * 2 hot spare
>>>>>>
>>>>>> Some additional information and more tests results (after
>>>>>> changing the log level):
>>>>>>
>>>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>>>> CentOS release 6.8 (Final)
>>>>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID
>>>>>> SAS-3 3108 [Invader] (rev 02)
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Create the file to /gdata (gluster)*
>>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1
>>>>>> bs=1M count=1000
>>>>>> 1000+0 records in
>>>>>> 1000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>>>>
>>>>>> *Create the file to /home (ext4)*
>>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1
>>>>>> bs=1M count=1000
>>>>>> 1000+0 records in
>>>>>> 1000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s -
>>>>>> *3 times as fast*
>>>>>>
>>>>>>
>>>>>> Copy from /gdata to /gdata (gluster to gluster)
>>>>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s*
>>>>>> - realllyyy slooowww
>>>>>>
>>>>>>
>>>>>> *Copy from /gdata to /gdata* *2nd time *(gluster to
>>>>>> gluster)**
>>>>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s*
>>>>>> - realllyyy slooowww again
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s
>>>>>> *30 times as fast
>>>>>>
>>>>>>
>>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>>>>> 2048000+0 records in
>>>>>> 2048000+0 records out
>>>>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* -
>>>>>> 30 times as fast
>>>>>>
>>>>>>
>>>>>> As a test, can we copy data directly to the xfs
>>>>>> mountpoint (/mnt/brick1) and bypass gluster?
>>>>>>
>>>>>>
>>>>>> Any help you could give us would be appreciated.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-***@gluster.org <mailto:Gluster-***@gluster.org>
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list Gluster-***@gluster.org
>>>>> <mailto:Gluster-***@gluster.org>
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pat Haley
2017-05-10 14:32:48 UTC
Reply
Permalink
Raw Message
Hi,

We finally managed to do the dd tests for an NFS-mounted gluster file
system. The profile results during that test are in

http://mseas.mit.edu/download/phaley/GlusterUsers/profile_gluster_nfs_test

The summary of the dd tests are

* writing to gluster disk mounted with fuse: 5 Mb/s
* writing to gluster disk mounted with nfs: 200 Mb/s

Pat


On 05/05/2017 08:11 PM, Pat Haley wrote:
>
> Hi,
>
> We redid the dd tests (this time using conv=sync oflag=sync to avoid
> caching questions). The profile results are in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/profile_gluster_fuse_test
>
>
> On 05/05/2017 12:47 PM, Ravishankar N wrote:
>> On 05/05/2017 08:42 PM, Pat Haley wrote:
>>>
>>> Hi Pranith,
>>>
>>> I presume you are asking for some version of the profile data that
>>> just shows the dd test (or a repeat of the dd test). If yes, how do
>>> I extract just that data?
>> Yes, that is what he is asking for. Just clear the existing profile
>> info using `gluster volume profile volname clear` and run the dd test
>> once. Then when you run profile info again, it should just give you
>> the stats for the dd test.
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/05/2017 10:58 AM, Pranith Kumar Karampuri wrote:
>>>> hi Pat,
>>>> Let us concentrate on the performance numbers part for now.
>>>> We will look at the permissions one after this?
>>>>
>>>> As per the profile info, only 2.6% of the work-load is writes.
>>>> There are too many Lookups.
>>>>
>>>> Would it be possible to get the data for just the dd test you were
>>>> doing earlier?
>>>>
>>>>
>>>> On Fri, May 5, 2017 at 8:14 PM, Pat Haley <***@mit.edu
>>>> <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith & Ravi,
>>>>
>>>> A couple of quick questions
>>>>
>>>> We have profile turned on. Are there specific queries we should
>>>> make that would help debug our configuration? (The default
>>>> profile info was previously sent in
>>>> http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html
>>>> <http://lists.gluster.org/pipermail/gluster-users/2017-May/030840.html>
>>>> but I'm not sure if that is what you were looking for.)
>>>>
>>>> We also started to do a test on serving gluster over NFS. We
>>>> rediscovered an issue we previously reported (
>>>> http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html
>>>> <http://lists.gluster.org/pipermail/gluster-users/2016-September/028289.html>
>>>> ) in that the NFS mounted version was ignoring the group write
>>>> permissions. What specific information would be useful in
>>>> debugging this?
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 04/14/2017 03:01 AM, Ravishankar N wrote:
>>>>> On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>>
>>>>>> On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N
>>>>>> <***@redhat.com <mailto:***@redhat.com>> wrote:
>>>>>>
>>>>>> Hi Pat,
>>>>>>
>>>>>> I'm assuming you are using gluster native (fuse mount).
>>>>>> If it helps, you could try mounting it via gluster NFS
>>>>>> (gnfs) and then see if there is an improvement in speed.
>>>>>> Fuse mounts are slower than gnfs mounts but you get the
>>>>>> benefit of avoiding a single point of failure. Unlike
>>>>>> fuse mounts, if the gluster node containing the gnfs
>>>>>> server goes down, all mounts done using that node will
>>>>>> fail). For fuse mounts, you could try tweaking the
>>>>>> write-behind xlator settings to see if it helps. See the
>>>>>> performance.write-behind and
>>>>>> performance.write-behind-window-size options in `gluster
>>>>>> volume set help`. Of course, even for gnfs mounts, you
>>>>>> can achieve fail-over by using CTDB.
>>>>>>
>>>>>>
>>>>>> Ravi,
>>>>>> Do you have any data that suggests fuse mounts are
>>>>>> slower than gNFS servers?
>>>>> I have heard anecdotal evidence time and again on the ML and
>>>>> IRC, which is why I wanted to compare it with NFS numbers on
>>>>> his setup.
>>>>>>
>>>>>> Pat,
>>>>>> I see that I am late to the thread, but do you happen
>>>>>> to have "profile info" of the workload?
>>>>>>
>>>>>> You can follow
>>>>>> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
>>>>>> <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/>
>>>>>> to get the information.
>>>>> Yeah, Let's see if profile info shows up anything interesting.
>>>>> -Ravi
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Ravi
>>>>>>
>>>>>>
>>>>>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We noticed a dramatic slowness when writing to a gluster
>>>>>>> disk when compared to writing to an NFS disk.
>>>>>>> Specifically when using dd (data duplicator) to write a
>>>>>>> 4.3 GB file of zeros:
>>>>>>>
>>>>>>> * on NFS disk (/home): 9.5 Gb/s
>>>>>>> * on gluster disk (/gdata): 508 Mb/s
>>>>>>>
>>>>>>> The gluser disk is 2 bricks joined together, no
>>>>>>> replication or anything else. The hardware is
>>>>>>> (literally) the same:
>>>>>>>
>>>>>>> * one server with 70 hard disks and a hardware RAID card.
>>>>>>> * 4 disks in a RAID-6 group (the NFS disk)
>>>>>>> * 32 disks in a RAID-6 group (the max allowed by the
>>>>>>> card, /mnt/brick1)
>>>>>>> * 32 disks in another RAID-6 group (/mnt/brick2)
>>>>>>> * 2 hot spare
>>>>>>>
>>>>>>> Some additional information and more tests results
>>>>>>> (after changing the log level):
>>>>>>>
>>>>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>>>>> CentOS release 6.8 (Final)
>>>>>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID
>>>>>>> SAS-3 3108 [Invader] (rev 02)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Create the file to /gdata (gluster)*
>>>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero
>>>>>>> of=/gdata/zero1 bs=1M count=1000
>>>>>>> 1000+0 records in
>>>>>>> 1000+0 records out
>>>>>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>>>>>
>>>>>>> *Create the file to /home (ext4)*
>>>>>>> [***@mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1
>>>>>>> bs=1M count=1000
>>>>>>> 1000+0 records in
>>>>>>> 1000+0 records out
>>>>>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s
>>>>>>> - *3 times as fast*
>>>>>>>
>>>>>>>
>>>>>>> Copy from /gdata to /gdata (gluster to gluster)
>>>>>>> *[***@mseas-data2 gdata]# dd if=/gdata/zero1
>>>>>>> of=/gdata/zero2
>>>>>>> 2048000+0 records in
>>>>>>> 2048000+0 records out
>>>>>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s*
>>>>>>> - realllyyy slooowww
>>>>>>>
>>>>>>>
>>>>>>> *Copy from /gdata to /gdata* *2nd time *(gluster to
>>>>>>> gluster)**
>>>>>>> [***@mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>>>>>> 2048000+0 records in
>>>>>>> 2048000+0 records out
>>>>>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s*
>>>>>>> - realllyyy slooowww again
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>>>>>> 2048000+0 records in
>>>>>>> 2048000+0 records out
>>>>>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s
>>>>>>> *30 times as fast
>>>>>>>
>>>>>>>
>>>>>>> *Copy from /home to /home (ext4 to ext4)*
>>>>>>> [***@mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>>>>>> 2048000+0 records in
>>>>>>> 2048000+0 records out
>>>>>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* -
>>>>>>> 30 times as fast
>>>>>>>
>>>>>>>
>>>>>>> As a test, can we copy data directly to the xfs
>>>>>>> mountpoint (/mnt/brick1) and bypass gluster?
>>>>>>>
>>>>>>>
>>>>>>> Any help you could give us would be appreciated.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-***@gluster.org <mailto:Gluster-***@gluster.org>
>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list Gluster-***@gluster.org
>>>>>> <mailto:Gluster-***@gluster.org>
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-10 15:44:04 UTC
Reply
Permalink
Raw Message
Is this the volume info you have?

>* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
*>>* Volume Name: data-volume
*>* Type: Distribute
*>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
*>* Status: Started
*>* Number of Bricks: 2
*>* Transport-type: tcp
*>* Bricks:
*>* Brick1: mseas-data2:/mnt/brick1
*>* Brick2: mseas-data2:/mnt/brick2
*>* Options Reconfigured:
*>* performance.readdir-ahead: on
*>* nfs.disable: on
*>

* nfs.export-volumes: off*

​I copied this from old thread from 2016. This is distribute volume. Did
you change any of the options in between?
Pat Haley
2017-05-10 15:47:17 UTC
Reply
Permalink
Raw Message
Here is what I see now:

[***@mseas-data2 ~]# gluster volume info

Volume Name: data-volume
Type: Distribute
Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: mseas-data2:/mnt/brick1
Brick2: mseas-data2:/mnt/brick2
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.exports-auth-enable: on
diagnostics.brick-sys-log-level: WARNING
performance.readdir-ahead: on
nfs.disable: on
nfs.export-volumes: off



On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
> Is this the volume info you have?
>
> >/[root at mseas-data2
> <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster
> volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
> ​I copied this from old thread from 2016. This is distribute volume.
> Did you change any of the options in between?

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-10 15:53:55 UTC
Reply
Permalink
Raw Message
Could you let me know the speed without oflag=sync on both the mounts? No
need to collect profiles.

On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:

>
> Here is what I see now:
>
> [***@mseas-data2 ~]# gluster volume info
>
> Volume Name: data-volume
> Type: Distribute
> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: mseas-data2:/mnt/brick1
> Brick2: mseas-data2:/mnt/brick2
> Options Reconfigured:
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> nfs.exports-auth-enable: on
> diagnostics.brick-sys-log-level: WARNING
> performance.readdir-ahead: on
> nfs.disable: on
> nfs.export-volumes: off
>
>
>
> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>
> Is this the volume info you have?
>
> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
> *>>* Volume Name: data-volume
> *>* Type: Distribute
> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> *>* Status: Started
> *>* Number of Bricks: 2
> *>* Transport-type: tcp
> *>* Bricks:
> *>* Brick1: mseas-data2:/mnt/brick1
> *>* Brick2: mseas-data2:/mnt/brick2
> *>* Options Reconfigured:
> *>* performance.readdir-ahead: on
> *>* nfs.disable: on
> *>* nfs.export-volumes: off
>
> *
>
> ​I copied this from old thread from 2016. This is distribute volume. Did
> you change any of the options in between?
>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-10 16:05:25 UTC
Reply
Permalink
Raw Message
Without the oflag=sync and only a single test of each, the FUSE is going
faster than NFS:

FUSE:
mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
of=zeros.txt conv=sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s


NFS
mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
conv=sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s


On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
> Could you let me know the speed without oflag=sync on both the mounts?
> No need to collect profiles.
>
> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Here is what I see now:
>
> [***@mseas-data2 ~]# gluster volume info
>
> Volume Name: data-volume
> Type: Distribute
> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: mseas-data2:/mnt/brick1
> Brick2: mseas-data2:/mnt/brick2
> Options Reconfigured:
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> nfs.exports-auth-enable: on
> diagnostics.brick-sys-log-level: WARNING
> performance.readdir-ahead: on
> nfs.disable: on
> nfs.export-volumes: off
>
>
>
> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>> Is this the volume info you have?
>>
>> >/[root at mseas-data2
>> <http://www.gluster.org/mailman/listinfo/gluster-users> ~]#
>> gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>> ​I copied this from old thread from 2016. This is distribute
>> volume. Did you change any of the options in between?
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-10 16:15:44 UTC
Reply
Permalink
Raw Message
Okay good. At least this validates my doubts. Handling O_SYNC in gluster
NFS and fuse is a bit different.
When application opens a file with O_SYNC on fuse mount then each write
syscall has to be written to disk as part of the syscall where as in case
of NFS, there is no concept of open. NFS performs write though a handle
saying it needs to be a synchronous write, so write() syscall is performed
first then it performs fsync(). so an write on an fd with O_SYNC becomes
write+fsync. I am suspecting that when multiple threads do this
write+fsync() operation on the same file, multiple writes are batched
together to be written do disk so the throughput on the disk is increasing
is my guess.

Does it answer your doubts?

On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:

>
> Without the oflag=sync and only a single test of each, the FUSE is going
> faster than NFS:
>
> FUSE:
> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
> of=zeros.txt conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>
>
> NFS
> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
> conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>
>
>
> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>
> Could you let me know the speed without oflag=sync on both the mounts? No
> need to collect profiles.
>
> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Here is what I see now:
>>
>> [***@mseas-data2 ~]# gluster volume info
>>
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Options Reconfigured:
>> diagnostics.count-fop-hits: on
>> diagnostics.latency-measurement: on
>> nfs.exports-auth-enable: on
>> diagnostics.brick-sys-log-level: WARNING
>> performance.readdir-ahead: on
>> nfs.disable: on
>> nfs.export-volumes: off
>>
>>
>>
>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>
>> Is this the volume info you have?
>>
>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>> *>>* Volume Name: data-volume
>> *>* Type: Distribute
>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> *>* Status: Started
>> *>* Number of Bricks: 2
>> *>* Transport-type: tcp
>> *>* Bricks:
>> *>* Brick1: mseas-data2:/mnt/brick1
>> *>* Brick2: mseas-data2:/mnt/brick2
>> *>* Options Reconfigured:
>> *>* performance.readdir-ahead: on
>> *>* nfs.disable: on
>> *>* nfs.export-volumes: off
>>
>> *
>>
>> ​I copied this from old thread from 2016. This is distribute volume. Did
>> you change any of the options in between?
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-10 16:45:04 UTC
Reply
Permalink
Raw Message
Hi Pranith,

Not entirely sure (this isn't my area of expertise). I'll run your
answer by some other people who are more familiar with this.

I am also uncertain about how to interpret the results when we also add
the dd tests writing to the /home area (no gluster, still on the same
machine)

* dd test without oflag=sync (rough average of multiple tests)
o gluster w/ fuse mount : 570 Mb/s
o gluster w/ nfs mount: 390 Mb/s
o nfs (no gluster): 1.2 Gb/s
* dd test with oflag=sync (rough average of multiple tests)
o gluster w/ fuse mount: 5 Mb/s
o gluster w/ nfs mount: 200 Mb/s
o nfs (no gluster): 20 Mb/s

Given that the non-gluster area is a RAID-6 of 4 disks while each brick
of the gluster area is a RAID-6 of 32 disks, I would naively expect the
writes to the gluster area to be roughly 8x faster than to the non-gluster.

I still think we have a speed issue, I can't tell if fuse vs nfs is part
of the problem. Was there anything useful in the profiles?

Pat


On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
> Okay good. At least this validates my doubts. Handling O_SYNC in
> gluster NFS and fuse is a bit different.
> When application opens a file with O_SYNC on fuse mount then each
> write syscall has to be written to disk as part of the syscall where
> as in case of NFS, there is no concept of open. NFS performs write
> though a handle saying it needs to be a synchronous write, so write()
> syscall is performed first then it performs fsync(). so an write on an
> fd with O_SYNC becomes write+fsync. I am suspecting that when multiple
> threads do this write+fsync() operation on the same file, multiple
> writes are batched together to be written do disk so the throughput on
> the disk is increasing is my guess.
>
> Does it answer your doubts?
>
> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Without the oflag=sync and only a single test of each, the FUSE is
> going faster than NFS:
>
> FUSE:
> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
> of=zeros.txt conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>
>
> NFS
> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
> of=zeros.txt conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>
>
>
> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>> Could you let me know the speed without oflag=sync on both the
>> mounts? No need to collect profiles.
>>
>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Here is what I see now:
>>
>> [***@mseas-data2 ~]# gluster volume info
>>
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Options Reconfigured:
>> diagnostics.count-fop-hits: on
>> diagnostics.latency-measurement: on
>> nfs.exports-auth-enable: on
>> diagnostics.brick-sys-log-level: WARNING
>> performance.readdir-ahead: on
>> nfs.disable: on
>> nfs.export-volumes: off
>>
>>
>>
>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>> Is this the volume info you have?
>>>
>>> >/[root at mseas-data2
>>> <http://www.gluster.org/mailman/listinfo/gluster-users> ~]#
>>> gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>> ​I copied this from old thread from 2016. This is distribute
>>> volume. Did you change any of the options in between?
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-10 17:27:46 UTC
Reply
Permalink
Raw Message
On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> Not entirely sure (this isn't my area of expertise). I'll run your answer
> by some other people who are more familiar with this.
>
> I am also uncertain about how to interpret the results when we also add
> the dd tests writing to the /home area (no gluster, still on the same
> machine)
>
> - dd test without oflag=sync (rough average of multiple tests)
> - gluster w/ fuse mount : 570 Mb/s
> - gluster w/ nfs mount: 390 Mb/s
> - nfs (no gluster): 1.2 Gb/s
> - dd test with oflag=sync (rough average of multiple tests)
> - gluster w/ fuse mount: 5 Mb/s
> - gluster w/ nfs mount: 200 Mb/s
> - nfs (no gluster): 20 Mb/s
>
> Given that the non-gluster area is a RAID-6 of 4 disks while each brick of
> the gluster area is a RAID-6 of 32 disks, I would naively expect the writes
> to the gluster area to be roughly 8x faster than to the non-gluster.
>

I think a better test is to try and write to a file using nfs without any
gluster to a location that is not inside the brick but someother location
that is on same disk(s). If you are mounting the partition as the brick,
then we can write to a file inside .glusterfs directory, something like
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.


> I still think we have a speed issue, I can't tell if fuse vs nfs is part
> of the problem.
>

I got interested in the post because I read that fuse speed is lesser than
nfs speed which is counter-intuitive to my understanding. So wanted
clarifications. Now that I got my clarifications where fuse outperformed
nfs without sync, we can resume testing as described above and try to find
what it is. Based on your email-id I am guessing you are from Boston and I
am from Bangalore so if you are okay with doing this debugging for multiple
days because of timezones, I will be happy to help. Please be a bit patient
with me, I am under a release crunch but I am very curious with the problem
you posted.

Was there anything useful in the profiles?
>

Unfortunately profiles didn't help me much, I think we are collecting the
profiles from an active volume, so it has a lot of information that is not
pertaining to dd so it is difficult to find the contributions of dd. So I
went through your post again and found something I didn't pay much
attention to earlier i.e. oflag=sync, so did my own tests on my setup with
FUSE so sent that reply.


>
> Pat
>
>
>
> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>
> Okay good. At least this validates my doubts. Handling O_SYNC in gluster
> NFS and fuse is a bit different.
> When application opens a file with O_SYNC on fuse mount then each write
> syscall has to be written to disk as part of the syscall where as in case
> of NFS, there is no concept of open. NFS performs write though a handle
> saying it needs to be a synchronous write, so write() syscall is performed
> first then it performs fsync(). so an write on an fd with O_SYNC becomes
> write+fsync. I am suspecting that when multiple threads do this
> write+fsync() operation on the same file, multiple writes are batched
> together to be written do disk so the throughput on the disk is increasing
> is my guess.
>
> Does it answer your doubts?
>
> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Without the oflag=sync and only a single test of each, the FUSE is going
>> faster than NFS:
>>
>> FUSE:
>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>> of=zeros.txt conv=sync
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>
>>
>> NFS
>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
>> conv=sync
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>
>>
>>
>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>
>> Could you let me know the speed without oflag=sync on both the mounts? No
>> need to collect profiles.
>>
>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Here is what I see now:
>>>
>>> [***@mseas-data2 ~]# gluster volume info
>>>
>>> Volume Name: data-volume
>>> Type: Distribute
>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>> Status: Started
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: mseas-data2:/mnt/brick1
>>> Brick2: mseas-data2:/mnt/brick2
>>> Options Reconfigured:
>>> diagnostics.count-fop-hits: on
>>> diagnostics.latency-measurement: on
>>> nfs.exports-auth-enable: on
>>> diagnostics.brick-sys-log-level: WARNING
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> nfs.export-volumes: off
>>>
>>>
>>>
>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>
>>> Is this the volume info you have?
>>>
>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>> *>>* Volume Name: data-volume
>>> *>* Type: Distribute
>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>> *>* Status: Started
>>> *>* Number of Bricks: 2
>>> *>* Transport-type: tcp
>>> *>* Bricks:
>>> *>* Brick1: mseas-data2:/mnt/brick1
>>> *>* Brick2: mseas-data2:/mnt/brick2
>>> *>* Options Reconfigured:
>>> *>* performance.readdir-ahead: on
>>> *>* nfs.disable: on
>>> *>* nfs.export-volumes: off
>>>
>>> *
>>>
>>> ​I copied this from old thread from 2016. This is distribute volume. Did
>>> you change any of the options in between?
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-10 21:18:26 UTC
Reply
Permalink
Raw Message
Hi Pranith,

Since we are mounting the partitions as the bricks, I tried the dd test
writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
results without oflag=sync were 1.6 Gb/s (faster than gluster but not as
fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/
fewer disks).

Pat


On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>
>
> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> Not entirely sure (this isn't my area of expertise). I'll run your
> answer by some other people who are more familiar with this.
>
> I am also uncertain about how to interpret the results when we
> also add the dd tests writing to the /home area (no gluster, still
> on the same machine)
>
> * dd test without oflag=sync (rough average of multiple tests)
> o gluster w/ fuse mount : 570 Mb/s
> o gluster w/ nfs mount: 390 Mb/s
> o nfs (no gluster): 1.2 Gb/s
> * dd test with oflag=sync (rough average of multiple tests)
> o gluster w/ fuse mount: 5 Mb/s
> o gluster w/ nfs mount: 200 Mb/s
> o nfs (no gluster): 20 Mb/s
>
> Given that the non-gluster area is a RAID-6 of 4 disks while each
> brick of the gluster area is a RAID-6 of 32 disks, I would naively
> expect the writes to the gluster area to be roughly 8x faster than
> to the non-gluster.
>
>
> I think a better test is to try and write to a file using nfs without
> any gluster to a location that is not inside the brick but someother
> location that is on same disk(s). If you are mounting the partition as
> the brick, then we can write to a file inside .glusterfs directory,
> something like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>
>
> I still think we have a speed issue, I can't tell if fuse vs nfs
> is part of the problem.
>
>
> I got interested in the post because I read that fuse speed is lesser
> than nfs speed which is counter-intuitive to my understanding. So
> wanted clarifications. Now that I got my clarifications where fuse
> outperformed nfs without sync, we can resume testing as described
> above and try to find what it is. Based on your email-id I am guessing
> you are from Boston and I am from Bangalore so if you are okay with
> doing this debugging for multiple days because of timezones, I will be
> happy to help. Please be a bit patient with me, I am under a release
> crunch but I am very curious with the problem you posted.
>
> Was there anything useful in the profiles?
>
>
> Unfortunately profiles didn't help me much, I think we are collecting
> the profiles from an active volume, so it has a lot of information
> that is not pertaining to dd so it is difficult to find the
> contributions of dd. So I went through your post again and found
> something I didn't pay much attention to earlier i.e. oflag=sync, so
> did my own tests on my setup with FUSE so sent that reply.
>
>
> Pat
>
>
>
> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>> Okay good. At least this validates my doubts. Handling O_SYNC in
>> gluster NFS and fuse is a bit different.
>> When application opens a file with O_SYNC on fuse mount then each
>> write syscall has to be written to disk as part of the syscall
>> where as in case of NFS, there is no concept of open. NFS
>> performs write though a handle saying it needs to be a
>> synchronous write, so write() syscall is performed first then it
>> performs fsync(). so an write on an fd with O_SYNC becomes
>> write+fsync. I am suspecting that when multiple threads do this
>> write+fsync() operation on the same file, multiple writes are
>> batched together to be written do disk so the throughput on the
>> disk is increasing is my guess.
>>
>> Does it answer your doubts?
>>
>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Without the oflag=sync and only a single test of each, the
>> FUSE is going faster than NFS:
>>
>> FUSE:
>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096
>> bs=1048576 of=zeros.txt conv=sync
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>
>>
>> NFS
>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
>> of=zeros.txt conv=sync
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>
>>
>>
>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>> Could you let me know the speed without oflag=sync on both
>>> the mounts? No need to collect profiles.
>>>
>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Here is what I see now:
>>>
>>> [***@mseas-data2 ~]# gluster volume info
>>>
>>> Volume Name: data-volume
>>> Type: Distribute
>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>> Status: Started
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: mseas-data2:/mnt/brick1
>>> Brick2: mseas-data2:/mnt/brick2
>>> Options Reconfigured:
>>> diagnostics.count-fop-hits: on
>>> diagnostics.latency-measurement: on
>>> nfs.exports-auth-enable: on
>>> diagnostics.brick-sys-log-level: WARNING
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> nfs.export-volumes: off
>>>
>>>
>>>
>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>> Is this the volume info you have?
>>>>
>>>> >/[root at mseas-data2
>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>> ​I copied this from old thread from 2016. This is
>>>> distribute volume. Did you change any of the options in
>>>> between?
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-11 11:05:41 UTC
Reply
Permalink
Raw Message
On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> Since we are mounting the partitions as the bricks, I tried the dd test
> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
> results without oflag=sync were 1.6 Gb/s (faster than gluster but not as
> fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ fewer
> disks).
>

Okay, then 1.6Gb/s is what we need to target for, considering your volume
is just distribute. Is there any way you can do tests on similar hardware
but at a small scale? Just so we can run the workload to learn more about
the bottlenecks in the system? We can probably try to get the speed to
1.2Gb/s on your /home partition you were telling me yesterday. Let me know
if that is something you are okay to do.


>
> Pat
>
>
>
> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>
>
>
> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> Not entirely sure (this isn't my area of expertise). I'll run your
>> answer by some other people who are more familiar with this.
>>
>> I am also uncertain about how to interpret the results when we also add
>> the dd tests writing to the /home area (no gluster, still on the same
>> machine)
>>
>> - dd test without oflag=sync (rough average of multiple tests)
>> - gluster w/ fuse mount : 570 Mb/s
>> - gluster w/ nfs mount: 390 Mb/s
>> - nfs (no gluster): 1.2 Gb/s
>> - dd test with oflag=sync (rough average of multiple tests)
>> - gluster w/ fuse mount: 5 Mb/s
>> - gluster w/ nfs mount: 200 Mb/s
>> - nfs (no gluster): 20 Mb/s
>>
>> Given that the non-gluster area is a RAID-6 of 4 disks while each brick
>> of the gluster area is a RAID-6 of 32 disks, I would naively expect the
>> writes to the gluster area to be roughly 8x faster than to the non-gluster.
>>
>
> I think a better test is to try and write to a file using nfs without any
> gluster to a location that is not inside the brick but someother location
> that is on same disk(s). If you are mounting the partition as the brick,
> then we can write to a file inside .glusterfs directory, something like
> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>
>
>> I still think we have a speed issue, I can't tell if fuse vs nfs is part
>> of the problem.
>>
>
> I got interested in the post because I read that fuse speed is lesser than
> nfs speed which is counter-intuitive to my understanding. So wanted
> clarifications. Now that I got my clarifications where fuse outperformed
> nfs without sync, we can resume testing as described above and try to find
> what it is. Based on your email-id I am guessing you are from Boston and I
> am from Bangalore so if you are okay with doing this debugging for multiple
> days because of timezones, I will be happy to help. Please be a bit patient
> with me, I am under a release crunch but I am very curious with the problem
> you posted.
>
> Was there anything useful in the profiles?
>>
>
> Unfortunately profiles didn't help me much, I think we are collecting the
> profiles from an active volume, so it has a lot of information that is not
> pertaining to dd so it is difficult to find the contributions of dd. So I
> went through your post again and found something I didn't pay much
> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
> FUSE so sent that reply.
>
>
>>
>> Pat
>>
>>
>>
>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>
>> Okay good. At least this validates my doubts. Handling O_SYNC in gluster
>> NFS and fuse is a bit different.
>> When application opens a file with O_SYNC on fuse mount then each write
>> syscall has to be written to disk as part of the syscall where as in case
>> of NFS, there is no concept of open. NFS performs write though a handle
>> saying it needs to be a synchronous write, so write() syscall is performed
>> first then it performs fsync(). so an write on an fd with O_SYNC becomes
>> write+fsync. I am suspecting that when multiple threads do this
>> write+fsync() operation on the same file, multiple writes are batched
>> together to be written do disk so the throughput on the disk is increasing
>> is my guess.
>>
>> Does it answer your doubts?
>>
>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Without the oflag=sync and only a single test of each, the FUSE is going
>>> faster than NFS:
>>>
>>> FUSE:
>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>> of=zeros.txt conv=sync
>>> 4096+0 records in
>>> 4096+0 records out
>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>
>>>
>>> NFS
>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
>>> conv=sync
>>> 4096+0 records in
>>> 4096+0 records out
>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>
>>>
>>>
>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>
>>> Could you let me know the speed without oflag=sync on both the mounts?
>>> No need to collect profiles.
>>>
>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Here is what I see now:
>>>>
>>>> [***@mseas-data2 ~]# gluster volume info
>>>>
>>>> Volume Name: data-volume
>>>> Type: Distribute
>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>> Status: Started
>>>> Number of Bricks: 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: mseas-data2:/mnt/brick1
>>>> Brick2: mseas-data2:/mnt/brick2
>>>> Options Reconfigured:
>>>> diagnostics.count-fop-hits: on
>>>> diagnostics.latency-measurement: on
>>>> nfs.exports-auth-enable: on
>>>> diagnostics.brick-sys-log-level: WARNING
>>>> performance.readdir-ahead: on
>>>> nfs.disable: on
>>>> nfs.export-volumes: off
>>>>
>>>>
>>>>
>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>> Is this the volume info you have?
>>>>
>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>> *>>* Volume Name: data-volume
>>>> *>* Type: Distribute
>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>> *>* Status: Started
>>>> *>* Number of Bricks: 2
>>>> *>* Transport-type: tcp
>>>> *>* Bricks:
>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>> *>* Options Reconfigured:
>>>> *>* performance.readdir-ahead: on
>>>> *>* nfs.disable: on
>>>> *>* nfs.export-volumes: off
>>>>
>>>> *
>>>>
>>>> ​I copied this from old thread from 2016. This is distribute volume.
>>>> Did you change any of the options in between?
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-11 15:27:44 UTC
Reply
Permalink
Raw Message
Hi Pranith,

Unfortunately, we don't have similar hardware for a small scale test.
All we have is our production hardware.

Pat



On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>
>
> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> Since we are mounting the partitions as the bricks, I tried the dd
> test writing to
> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
> results without oflag=sync were 1.6 Gb/s (faster than gluster but
> not as fast as I was expecting given the 1.2 Gb/s to the
> no-gluster area w/ fewer disks).
>
>
> Okay, then 1.6Gb/s is what we need to target for, considering your
> volume is just distribute. Is there any way you can do tests on
> similar hardware but at a small scale? Just so we can run the workload
> to learn more about the bottlenecks in the system? We can probably try
> to get the speed to 1.2Gb/s on your /home partition you were telling
> me yesterday. Let me know if that is something you are okay to do.
>
>
> Pat
>
>
>
> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Not entirely sure (this isn't my area of expertise). I'll
>> run your answer by some other people who are more familiar
>> with this.
>>
>> I am also uncertain about how to interpret the results when
>> we also add the dd tests writing to the /home area (no
>> gluster, still on the same machine)
>>
>> * dd test without oflag=sync (rough average of multiple tests)
>> o gluster w/ fuse mount : 570 Mb/s
>> o gluster w/ nfs mount: 390 Mb/s
>> o nfs (no gluster): 1.2 Gb/s
>> * dd test with oflag=sync (rough average of multiple tests)
>> o gluster w/ fuse mount: 5 Mb/s
>> o gluster w/ nfs mount: 200 Mb/s
>> o nfs (no gluster): 20 Mb/s
>>
>> Given that the non-gluster area is a RAID-6 of 4 disks while
>> each brick of the gluster area is a RAID-6 of 32 disks, I
>> would naively expect the writes to the gluster area to be
>> roughly 8x faster than to the non-gluster.
>>
>>
>> I think a better test is to try and write to a file using nfs
>> without any gluster to a location that is not inside the brick
>> but someother location that is on same disk(s). If you are
>> mounting the partition as the brick, then we can write to a file
>> inside .glusterfs directory, something like
>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>
>>
>> I still think we have a speed issue, I can't tell if fuse vs
>> nfs is part of the problem.
>>
>>
>> I got interested in the post because I read that fuse speed is
>> lesser than nfs speed which is counter-intuitive to my
>> understanding. So wanted clarifications. Now that I got my
>> clarifications where fuse outperformed nfs without sync, we can
>> resume testing as described above and try to find what it is.
>> Based on your email-id I am guessing you are from Boston and I am
>> from Bangalore so if you are okay with doing this debugging for
>> multiple days because of timezones, I will be happy to help.
>> Please be a bit patient with me, I am under a release crunch but
>> I am very curious with the problem you posted.
>>
>> Was there anything useful in the profiles?
>>
>>
>> Unfortunately profiles didn't help me much, I think we are
>> collecting the profiles from an active volume, so it has a lot of
>> information that is not pertaining to dd so it is difficult to
>> find the contributions of dd. So I went through your post again
>> and found something I didn't pay much attention to earlier i.e.
>> oflag=sync, so did my own tests on my setup with FUSE so sent
>> that reply.
>>
>>
>> Pat
>>
>>
>>
>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>> Okay good. At least this validates my doubts. Handling
>>> O_SYNC in gluster NFS and fuse is a bit different.
>>> When application opens a file with O_SYNC on fuse mount then
>>> each write syscall has to be written to disk as part of the
>>> syscall where as in case of NFS, there is no concept of
>>> open. NFS performs write though a handle saying it needs to
>>> be a synchronous write, so write() syscall is performed
>>> first then it performs fsync(). so an write on an fd with
>>> O_SYNC becomes write+fsync. I am suspecting that when
>>> multiple threads do this write+fsync() operation on the same
>>> file, multiple writes are batched together to be written do
>>> disk so the throughput on the disk is increasing is my guess.
>>>
>>> Does it answer your doubts?
>>>
>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Without the oflag=sync and only a single test of each,
>>> the FUSE is going faster than NFS:
>>>
>>> FUSE:
>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096
>>> bs=1048576 of=zeros.txt conv=sync
>>> 4096+0 records in
>>> 4096+0 records out
>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>
>>>
>>> NFS
>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096
>>> bs=1048576 of=zeros.txt conv=sync
>>> 4096+0 records in
>>> 4096+0 records out
>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>
>>>
>>>
>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>> Could you let me know the speed without oflag=sync on
>>>> both the mounts? No need to collect profiles.
>>>>
>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Here is what I see now:
>>>>
>>>> [***@mseas-data2 ~]# gluster volume info
>>>>
>>>> Volume Name: data-volume
>>>> Type: Distribute
>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>> Status: Started
>>>> Number of Bricks: 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: mseas-data2:/mnt/brick1
>>>> Brick2: mseas-data2:/mnt/brick2
>>>> Options Reconfigured:
>>>> diagnostics.count-fop-hits: on
>>>> diagnostics.latency-measurement: on
>>>> nfs.exports-auth-enable: on
>>>> diagnostics.brick-sys-log-level: WARNING
>>>> performance.readdir-ahead: on
>>>> nfs.disable: on
>>>> nfs.export-volumes: off
>>>>
>>>>
>>>>
>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>> Is this the volume info you have?
>>>>>
>>>>> >/[root at mseas-data2
>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>> ​I copied this from old thread from 2016. This is
>>>>> distribute volume. Did you change any of the
>>>>> options in between?
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-11 15:32:16 UTC
Reply
Permalink
Raw Message
On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> Unfortunately, we don't have similar hardware for a small scale test. All
> we have is our production hardware.
>

You said something about /home partition which has lesser disks, we can
create plain distribute volume inside one of those directories. After we
are done, we can remove the setup. What do you say?


>
> Pat
>
>
>
>
> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>
>
>
> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> Since we are mounting the partitions as the bricks, I tried the dd test
>> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
>> results without oflag=sync were 1.6 Gb/s (faster than gluster but not as
>> fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ fewer
>> disks).
>>
>
> Okay, then 1.6Gb/s is what we need to target for, considering your volume
> is just distribute. Is there any way you can do tests on similar hardware
> but at a small scale? Just so we can run the workload to learn more about
> the bottlenecks in the system? We can probably try to get the speed to
> 1.2Gb/s on your /home partition you were telling me yesterday. Let me know
> if that is something you are okay to do.
>
>
>>
>> Pat
>>
>>
>>
>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>> answer by some other people who are more familiar with this.
>>>
>>> I am also uncertain about how to interpret the results when we also add
>>> the dd tests writing to the /home area (no gluster, still on the same
>>> machine)
>>>
>>> - dd test without oflag=sync (rough average of multiple tests)
>>> - gluster w/ fuse mount : 570 Mb/s
>>> - gluster w/ nfs mount: 390 Mb/s
>>> - nfs (no gluster): 1.2 Gb/s
>>> - dd test with oflag=sync (rough average of multiple tests)
>>> - gluster w/ fuse mount: 5 Mb/s
>>> - gluster w/ nfs mount: 200 Mb/s
>>> - nfs (no gluster): 20 Mb/s
>>>
>>> Given that the non-gluster area is a RAID-6 of 4 disks while each brick
>>> of the gluster area is a RAID-6 of 32 disks, I would naively expect the
>>> writes to the gluster area to be roughly 8x faster than to the non-gluster.
>>>
>>
>> I think a better test is to try and write to a file using nfs without any
>> gluster to a location that is not inside the brick but someother location
>> that is on same disk(s). If you are mounting the partition as the brick,
>> then we can write to a file inside .glusterfs directory, something like
>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>
>>
>>> I still think we have a speed issue, I can't tell if fuse vs nfs is part
>>> of the problem.
>>>
>>
>> I got interested in the post because I read that fuse speed is lesser
>> than nfs speed which is counter-intuitive to my understanding. So wanted
>> clarifications. Now that I got my clarifications where fuse outperformed
>> nfs without sync, we can resume testing as described above and try to find
>> what it is. Based on your email-id I am guessing you are from Boston and I
>> am from Bangalore so if you are okay with doing this debugging for multiple
>> days because of timezones, I will be happy to help. Please be a bit patient
>> with me, I am under a release crunch but I am very curious with the problem
>> you posted.
>>
>> Was there anything useful in the profiles?
>>>
>>
>> Unfortunately profiles didn't help me much, I think we are collecting the
>> profiles from an active volume, so it has a lot of information that is not
>> pertaining to dd so it is difficult to find the contributions of dd. So I
>> went through your post again and found something I didn't pay much
>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
>> FUSE so sent that reply.
>>
>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>
>>> Okay good. At least this validates my doubts. Handling O_SYNC in gluster
>>> NFS and fuse is a bit different.
>>> When application opens a file with O_SYNC on fuse mount then each write
>>> syscall has to be written to disk as part of the syscall where as in case
>>> of NFS, there is no concept of open. NFS performs write though a handle
>>> saying it needs to be a synchronous write, so write() syscall is performed
>>> first then it performs fsync(). so an write on an fd with O_SYNC becomes
>>> write+fsync. I am suspecting that when multiple threads do this
>>> write+fsync() operation on the same file, multiple writes are batched
>>> together to be written do disk so the throughput on the disk is increasing
>>> is my guess.
>>>
>>> Does it answer your doubts?
>>>
>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>> going faster than NFS:
>>>>
>>>> FUSE:
>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>> of=zeros.txt conv=sync
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>
>>>>
>>>> NFS
>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
>>>> conv=sync
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>
>>>>
>>>>
>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>> Could you let me know the speed without oflag=sync on both the mounts?
>>>> No need to collect profiles.
>>>>
>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Here is what I see now:
>>>>>
>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>
>>>>> Volume Name: data-volume
>>>>> Type: Distribute
>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>> Status: Started
>>>>> Number of Bricks: 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>> Options Reconfigured:
>>>>> diagnostics.count-fop-hits: on
>>>>> diagnostics.latency-measurement: on
>>>>> nfs.exports-auth-enable: on
>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>> performance.readdir-ahead: on
>>>>> nfs.disable: on
>>>>> nfs.export-volumes: off
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>> Is this the volume info you have?
>>>>>
>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>> *>>* Volume Name: data-volume
>>>>> *>* Type: Distribute
>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>> *>* Status: Started
>>>>> *>* Number of Bricks: 2
>>>>> *>* Transport-type: tcp
>>>>> *>* Bricks:
>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>> *>* Options Reconfigured:
>>>>> *>* performance.readdir-ahead: on
>>>>> *>* nfs.disable: on
>>>>> *>* nfs.export-volumes: off
>>>>>
>>>>> *
>>>>>
>>>>> ​I copied this from old thread from 2016. This is distribute volume.
>>>>> Did you change any of the options in between?
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-11 16:02:38 UTC
Reply
Permalink
Raw Message
Hi Pranith,

The /home partition is mounted as ext4
/home ext4 defaults,usrquota,grpquota 1 2

The brick partitions are mounted ax xfs
/mnt/brick1 xfs defaults 0 0
/mnt/brick2 xfs defaults 0 0

Will this cause a problem with creating a volume under /home?

Pat


On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>
>
> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> Unfortunately, we don't have similar hardware for a small scale
> test. All we have is our production hardware.
>
>
> You said something about /home partition which has lesser disks, we
> can create plain distribute volume inside one of those directories.
> After we are done, we can remove the setup. What do you say?
>
>
> Pat
>
>
>
>
> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Since we are mounting the partitions as the bricks, I tried
>> the dd test writing to
>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
>> results without oflag=sync were 1.6 Gb/s (faster than gluster
>> but not as fast as I was expecting given the 1.2 Gb/s to the
>> no-gluster area w/ fewer disks).
>>
>>
>> Okay, then 1.6Gb/s is what we need to target for, considering
>> your volume is just distribute. Is there any way you can do tests
>> on similar hardware but at a small scale? Just so we can run the
>> workload to learn more about the bottlenecks in the system? We
>> can probably try to get the speed to 1.2Gb/s on your /home
>> partition you were telling me yesterday. Let me know if that is
>> something you are okay to do.
>>
>>
>> Pat
>>
>>
>>
>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Not entirely sure (this isn't my area of expertise).
>>> I'll run your answer by some other people who are more
>>> familiar with this.
>>>
>>> I am also uncertain about how to interpret the results
>>> when we also add the dd tests writing to the /home area
>>> (no gluster, still on the same machine)
>>>
>>> * dd test without oflag=sync (rough average of
>>> multiple tests)
>>> o gluster w/ fuse mount : 570 Mb/s
>>> o gluster w/ nfs mount: 390 Mb/s
>>> o nfs (no gluster): 1.2 Gb/s
>>> * dd test with oflag=sync (rough average of multiple
>>> tests)
>>> o gluster w/ fuse mount: 5 Mb/s
>>> o gluster w/ nfs mount: 200 Mb/s
>>> o nfs (no gluster): 20 Mb/s
>>>
>>> Given that the non-gluster area is a RAID-6 of 4 disks
>>> while each brick of the gluster area is a RAID-6 of 32
>>> disks, I would naively expect the writes to the gluster
>>> area to be roughly 8x faster than to the non-gluster.
>>>
>>>
>>> I think a better test is to try and write to a file using
>>> nfs without any gluster to a location that is not inside the
>>> brick but someother location that is on same disk(s). If you
>>> are mounting the partition as the brick, then we can write
>>> to a file inside .glusterfs directory, something like
>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>
>>>
>>> I still think we have a speed issue, I can't tell if
>>> fuse vs nfs is part of the problem.
>>>
>>>
>>> I got interested in the post because I read that fuse speed
>>> is lesser than nfs speed which is counter-intuitive to my
>>> understanding. So wanted clarifications. Now that I got my
>>> clarifications where fuse outperformed nfs without sync, we
>>> can resume testing as described above and try to find what
>>> it is. Based on your email-id I am guessing you are from
>>> Boston and I am from Bangalore so if you are okay with doing
>>> this debugging for multiple days because of timezones, I
>>> will be happy to help. Please be a bit patient with me, I am
>>> under a release crunch but I am very curious with the
>>> problem you posted.
>>>
>>> Was there anything useful in the profiles?
>>>
>>>
>>> Unfortunately profiles didn't help me much, I think we are
>>> collecting the profiles from an active volume, so it has a
>>> lot of information that is not pertaining to dd so it is
>>> difficult to find the contributions of dd. So I went through
>>> your post again and found something I didn't pay much
>>> attention to earlier i.e. oflag=sync, so did my own tests on
>>> my setup with FUSE so sent that reply.
>>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>> Okay good. At least this validates my doubts. Handling
>>>> O_SYNC in gluster NFS and fuse is a bit different.
>>>> When application opens a file with O_SYNC on fuse mount
>>>> then each write syscall has to be written to disk as
>>>> part of the syscall where as in case of NFS, there is
>>>> no concept of open. NFS performs write though a handle
>>>> saying it needs to be a synchronous write, so write()
>>>> syscall is performed first then it performs fsync(). so
>>>> an write on an fd with O_SYNC becomes write+fsync. I am
>>>> suspecting that when multiple threads do this
>>>> write+fsync() operation on the same file, multiple
>>>> writes are batched together to be written do disk so
>>>> the throughput on the disk is increasing is my guess.
>>>>
>>>> Does it answer your doubts?
>>>>
>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Without the oflag=sync and only a single test of
>>>> each, the FUSE is going faster than NFS:
>>>>
>>>> FUSE:
>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096
>>>> bs=1048576 of=zeros.txt conv=sync
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>
>>>>
>>>> NFS
>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096
>>>> bs=1048576 of=zeros.txt conv=sync
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>
>>>>
>>>>
>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>> Could you let me know the speed without oflag=sync
>>>>> on both the mounts? No need to collect profiles.
>>>>>
>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Here is what I see now:
>>>>>
>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>
>>>>> Volume Name: data-volume
>>>>> Type: Distribute
>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>> Status: Started
>>>>> Number of Bricks: 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>> Options Reconfigured:
>>>>> diagnostics.count-fop-hits: on
>>>>> diagnostics.latency-measurement: on
>>>>> nfs.exports-auth-enable: on
>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>> performance.readdir-ahead: on
>>>>> nfs.disable: on
>>>>> nfs.export-volumes: off
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 11:44 AM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>> Is this the volume info you have?
>>>>>>
>>>>>> >/[root at mseas-data2
>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>> ​I copied this from old thread from 2016.
>>>>>> This is distribute volume. Did you change any
>>>>>> of the options in between?
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-11 16:06:14 UTC
Reply
Permalink
Raw Message
On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> The /home partition is mounted as ext4
> /home ext4 defaults,usrquota,grpquota 1 2
>
> The brick partitions are mounted ax xfs
> /mnt/brick1 xfs defaults 0 0
> /mnt/brick2 xfs defaults 0 0
>
> Will this cause a problem with creating a volume under /home?
>

I don't think the bottleneck is disk. You can do the same tests you did on
your new volume to confirm?


>
> Pat
>
>
>
> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>
>
>
> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> Unfortunately, we don't have similar hardware for a small scale test.
>> All we have is our production hardware.
>>
>
> You said something about /home partition which has lesser disks, we can
> create plain distribute volume inside one of those directories. After we
> are done, we can remove the setup. What do you say?
>
>
>>
>> Pat
>>
>>
>>
>>
>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Since we are mounting the partitions as the bricks, I tried the dd test
>>> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
>>> results without oflag=sync were 1.6 Gb/s (faster than gluster but not as
>>> fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ fewer
>>> disks).
>>>
>>
>> Okay, then 1.6Gb/s is what we need to target for, considering your volume
>> is just distribute. Is there any way you can do tests on similar hardware
>> but at a small scale? Just so we can run the workload to learn more about
>> the bottlenecks in the system? We can probably try to get the speed to
>> 1.2Gb/s on your /home partition you were telling me yesterday. Let me know
>> if that is something you are okay to do.
>>
>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>>> answer by some other people who are more familiar with this.
>>>>
>>>> I am also uncertain about how to interpret the results when we also add
>>>> the dd tests writing to the /home area (no gluster, still on the same
>>>> machine)
>>>>
>>>> - dd test without oflag=sync (rough average of multiple tests)
>>>> - gluster w/ fuse mount : 570 Mb/s
>>>> - gluster w/ nfs mount: 390 Mb/s
>>>> - nfs (no gluster): 1.2 Gb/s
>>>> - dd test with oflag=sync (rough average of multiple tests)
>>>> - gluster w/ fuse mount: 5 Mb/s
>>>> - gluster w/ nfs mount: 200 Mb/s
>>>> - nfs (no gluster): 20 Mb/s
>>>>
>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each brick
>>>> of the gluster area is a RAID-6 of 32 disks, I would naively expect the
>>>> writes to the gluster area to be roughly 8x faster than to the non-gluster.
>>>>
>>>
>>> I think a better test is to try and write to a file using nfs without
>>> any gluster to a location that is not inside the brick but someother
>>> location that is on same disk(s). If you are mounting the partition as the
>>> brick, then we can write to a file inside .glusterfs directory, something
>>> like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>
>>>
>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is
>>>> part of the problem.
>>>>
>>>
>>> I got interested in the post because I read that fuse speed is lesser
>>> than nfs speed which is counter-intuitive to my understanding. So wanted
>>> clarifications. Now that I got my clarifications where fuse outperformed
>>> nfs without sync, we can resume testing as described above and try to find
>>> what it is. Based on your email-id I am guessing you are from Boston and I
>>> am from Bangalore so if you are okay with doing this debugging for multiple
>>> days because of timezones, I will be happy to help. Please be a bit patient
>>> with me, I am under a release crunch but I am very curious with the problem
>>> you posted.
>>>
>>> Was there anything useful in the profiles?
>>>>
>>>
>>> Unfortunately profiles didn't help me much, I think we are collecting
>>> the profiles from an active volume, so it has a lot of information that is
>>> not pertaining to dd so it is difficult to find the contributions of dd. So
>>> I went through your post again and found something I didn't pay much
>>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
>>> FUSE so sent that reply.
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>> Okay good. At least this validates my doubts. Handling O_SYNC in
>>>> gluster NFS and fuse is a bit different.
>>>> When application opens a file with O_SYNC on fuse mount then each write
>>>> syscall has to be written to disk as part of the syscall where as in case
>>>> of NFS, there is no concept of open. NFS performs write though a handle
>>>> saying it needs to be a synchronous write, so write() syscall is performed
>>>> first then it performs fsync(). so an write on an fd with O_SYNC becomes
>>>> write+fsync. I am suspecting that when multiple threads do this
>>>> write+fsync() operation on the same file, multiple writes are batched
>>>> together to be written do disk so the throughput on the disk is increasing
>>>> is my guess.
>>>>
>>>> Does it answer your doubts?
>>>>
>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>>> going faster than NFS:
>>>>>
>>>>> FUSE:
>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>>> of=zeros.txt conv=sync
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>>
>>>>>
>>>>> NFS
>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
>>>>> conv=sync
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>> Could you let me know the speed without oflag=sync on both the mounts?
>>>>> No need to collect profiles.
>>>>>
>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Here is what I see now:
>>>>>>
>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>
>>>>>> Volume Name: data-volume
>>>>>> Type: Distribute
>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>> Status: Started
>>>>>> Number of Bricks: 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>> Options Reconfigured:
>>>>>> diagnostics.count-fop-hits: on
>>>>>> diagnostics.latency-measurement: on
>>>>>> nfs.exports-auth-enable: on
>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>> performance.readdir-ahead: on
>>>>>> nfs.disable: on
>>>>>> nfs.export-volumes: off
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>> Is this the volume info you have?
>>>>>>
>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>>> *>>* Volume Name: data-volume
>>>>>> *>* Type: Distribute
>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>> *>* Status: Started
>>>>>> *>* Number of Bricks: 2
>>>>>> *>* Transport-type: tcp
>>>>>> *>* Bricks:
>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>> *>* Options Reconfigured:
>>>>>> *>* performance.readdir-ahead: on
>>>>>> *>* nfs.disable: on
>>>>>> *>* nfs.export-volumes: off
>>>>>>
>>>>>> *
>>>>>>
>>>>>> ​I copied this from old thread from 2016. This is distribute volume.
>>>>>> Did you change any of the options in between?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-12 14:34:04 UTC
Reply
Permalink
Raw Message
Hi Pranith,

My question was about setting up a gluster volume on an ext4 partition.
I thought we had the bricks mounted as xfs for compatibility with gluster?

Pat


On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>
>
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> The /home partition is mounted as ext4
> /home ext4 defaults,usrquota,grpquota 1 2
>
> The brick partitions are mounted ax xfs
> /mnt/brick1 xfs defaults 0 0
> /mnt/brick2 xfs defaults 0 0
>
> Will this cause a problem with creating a volume under /home?
>
>
> I don't think the bottleneck is disk. You can do the same tests you
> did on your new volume to confirm?
>
>
> Pat
>
>
>
> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Unfortunately, we don't have similar hardware for a small
>> scale test. All we have is our production hardware.
>>
>>
>> You said something about /home partition which has lesser disks,
>> we can create plain distribute volume inside one of those
>> directories. After we are done, we can remove the setup. What do
>> you say?
>>
>>
>> Pat
>>
>>
>>
>>
>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Since we are mounting the partitions as the bricks, I
>>> tried the dd test writing to
>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>> The results without oflag=sync were 1.6 Gb/s (faster
>>> than gluster but not as fast as I was expecting given
>>> the 1.2 Gb/s to the no-gluster area w/ fewer disks).
>>>
>>>
>>> Okay, then 1.6Gb/s is what we need to target for,
>>> considering your volume is just distribute. Is there any way
>>> you can do tests on similar hardware but at a small scale?
>>> Just so we can run the workload to learn more about the
>>> bottlenecks in the system? We can probably try to get the
>>> speed to 1.2Gb/s on your /home partition you were telling me
>>> yesterday. Let me know if that is something you are okay to do.
>>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Not entirely sure (this isn't my area of
>>>> expertise). I'll run your answer by some other
>>>> people who are more familiar with this.
>>>>
>>>> I am also uncertain about how to interpret the
>>>> results when we also add the dd tests writing to
>>>> the /home area (no gluster, still on the same machine)
>>>>
>>>> * dd test without oflag=sync (rough average of
>>>> multiple tests)
>>>> o gluster w/ fuse mount : 570 Mb/s
>>>> o gluster w/ nfs mount: 390 Mb/s
>>>> o nfs (no gluster): 1.2 Gb/s
>>>> * dd test with oflag=sync (rough average of
>>>> multiple tests)
>>>> o gluster w/ fuse mount: 5 Mb/s
>>>> o gluster w/ nfs mount: 200 Mb/s
>>>> o nfs (no gluster): 20 Mb/s
>>>>
>>>> Given that the non-gluster area is a RAID-6 of 4
>>>> disks while each brick of the gluster area is a
>>>> RAID-6 of 32 disks, I would naively expect the
>>>> writes to the gluster area to be roughly 8x faster
>>>> than to the non-gluster.
>>>>
>>>>
>>>> I think a better test is to try and write to a file
>>>> using nfs without any gluster to a location that is not
>>>> inside the brick but someother location that is on same
>>>> disk(s). If you are mounting the partition as the
>>>> brick, then we can write to a file inside .glusterfs
>>>> directory, something like
>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>
>>>>
>>>> I still think we have a speed issue, I can't tell
>>>> if fuse vs nfs is part of the problem.
>>>>
>>>>
>>>> I got interested in the post because I read that fuse
>>>> speed is lesser than nfs speed which is
>>>> counter-intuitive to my understanding. So wanted
>>>> clarifications. Now that I got my clarifications where
>>>> fuse outperformed nfs without sync, we can resume
>>>> testing as described above and try to find what it is.
>>>> Based on your email-id I am guessing you are from
>>>> Boston and I am from Bangalore so if you are okay with
>>>> doing this debugging for multiple days because of
>>>> timezones, I will be happy to help. Please be a bit
>>>> patient with me, I am under a release crunch but I am
>>>> very curious with the problem you posted.
>>>>
>>>> Was there anything useful in the profiles?
>>>>
>>>>
>>>> Unfortunately profiles didn't help me much, I think we
>>>> are collecting the profiles from an active volume, so
>>>> it has a lot of information that is not pertaining to
>>>> dd so it is difficult to find the contributions of dd.
>>>> So I went through your post again and found something I
>>>> didn't pay much attention to earlier i.e. oflag=sync,
>>>> so did my own tests on my setup with FUSE so sent that
>>>> reply.
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>> Okay good. At least this validates my doubts.
>>>>> Handling O_SYNC in gluster NFS and fuse is a bit
>>>>> different.
>>>>> When application opens a file with O_SYNC on fuse
>>>>> mount then each write syscall has to be written to
>>>>> disk as part of the syscall where as in case of
>>>>> NFS, there is no concept of open. NFS performs
>>>>> write though a handle saying it needs to be a
>>>>> synchronous write, so write() syscall is performed
>>>>> first then it performs fsync(). so an write on an
>>>>> fd with O_SYNC becomes write+fsync. I am
>>>>> suspecting that when multiple threads do this
>>>>> write+fsync() operation on the same file, multiple
>>>>> writes are batched together to be written do disk
>>>>> so the throughput on the disk is increasing is my
>>>>> guess.
>>>>>
>>>>> Does it answer your doubts?
>>>>>
>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Without the oflag=sync and only a single test
>>>>> of each, the FUSE is going faster than NFS:
>>>>>
>>>>> FUSE:
>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero
>>>>> count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s,
>>>>> 575 MB/s
>>>>>
>>>>>
>>>>> NFS
>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096
>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s,
>>>>> 376 MB/s
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 11:53 AM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>> Could you let me know the speed without
>>>>>> oflag=sync on both the mounts? No need to
>>>>>> collect profiles.
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Here is what I see now:
>>>>>>
>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>
>>>>>> Volume Name: data-volume
>>>>>> Type: Distribute
>>>>>> Volume ID:
>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>> Status: Started
>>>>>> Number of Bricks: 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>> Options Reconfigured:
>>>>>> diagnostics.count-fop-hits: on
>>>>>> diagnostics.latency-measurement: on
>>>>>> nfs.exports-auth-enable: on
>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>> performance.readdir-ahead: on
>>>>>> nfs.disable: on
>>>>>> nfs.export-volumes: off
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>> Is this the volume info you have?
>>>>>>>
>>>>>>> >/[root at mseas-data2
>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>> ​I copied this from old thread from
>>>>>>> 2016. This is distribute volume. Did you
>>>>>>> change any of the options in between?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-13 03:14:12 UTC
Reply
Permalink
Raw Message
On Fri, May 12, 2017 at 8:04 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> My question was about setting up a gluster volume on an ext4 partition. I
> thought we had the bricks mounted as xfs for compatibility with gluster?
>

Oh that should not be a problem. It works fine.


>
> Pat
>
>
>
> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>
>
>
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> The /home partition is mounted as ext4
>> /home ext4 defaults,usrquota,grpquota 1 2
>>
>> The brick partitions are mounted ax xfs
>> /mnt/brick1 xfs defaults 0 0
>> /mnt/brick2 xfs defaults 0 0
>>
>> Will this cause a problem with creating a volume under /home?
>>
>
> I don't think the bottleneck is disk. You can do the same tests you did on
> your new volume to confirm?
>
>
>>
>> Pat
>>
>>
>>
>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Unfortunately, we don't have similar hardware for a small scale test.
>>> All we have is our production hardware.
>>>
>>
>> You said something about /home partition which has lesser disks, we can
>> create plain distribute volume inside one of those directories. After we
>> are done, we can remove the setup. What do you say?
>>
>>
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Since we are mounting the partitions as the bricks, I tried the dd test
>>>> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not
>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/
>>>> fewer disks).
>>>>
>>>
>>> Okay, then 1.6Gb/s is what we need to target for, considering your
>>> volume is just distribute. Is there any way you can do tests on similar
>>> hardware but at a small scale? Just so we can run the workload to learn
>>> more about the bottlenecks in the system? We can probably try to get the
>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let
>>> me know if that is something you are okay to do.
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>>>> answer by some other people who are more familiar with this.
>>>>>
>>>>> I am also uncertain about how to interpret the results when we also
>>>>> add the dd tests writing to the /home area (no gluster, still on the same
>>>>> machine)
>>>>>
>>>>> - dd test without oflag=sync (rough average of multiple tests)
>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>> - dd test with oflag=sync (rough average of multiple tests)
>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>> - nfs (no gluster): 20 Mb/s
>>>>>
>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each
>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect
>>>>> the writes to the gluster area to be roughly 8x faster than to the
>>>>> non-gluster.
>>>>>
>>>>
>>>> I think a better test is to try and write to a file using nfs without
>>>> any gluster to a location that is not inside the brick but someother
>>>> location that is on same disk(s). If you are mounting the partition as the
>>>> brick, then we can write to a file inside .glusterfs directory, something
>>>> like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>
>>>>
>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is
>>>>> part of the problem.
>>>>>
>>>>
>>>> I got interested in the post because I read that fuse speed is lesser
>>>> than nfs speed which is counter-intuitive to my understanding. So wanted
>>>> clarifications. Now that I got my clarifications where fuse outperformed
>>>> nfs without sync, we can resume testing as described above and try to find
>>>> what it is. Based on your email-id I am guessing you are from Boston and I
>>>> am from Bangalore so if you are okay with doing this debugging for multiple
>>>> days because of timezones, I will be happy to help. Please be a bit patient
>>>> with me, I am under a release crunch but I am very curious with the problem
>>>> you posted.
>>>>
>>>> Was there anything useful in the profiles?
>>>>>
>>>>
>>>> Unfortunately profiles didn't help me much, I think we are collecting
>>>> the profiles from an active volume, so it has a lot of information that is
>>>> not pertaining to dd so it is difficult to find the contributions of dd. So
>>>> I went through your post again and found something I didn't pay much
>>>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
>>>> FUSE so sent that reply.
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in
>>>>> gluster NFS and fuse is a bit different.
>>>>> When application opens a file with O_SYNC on fuse mount then each
>>>>> write syscall has to be written to disk as part of the syscall where as in
>>>>> case of NFS, there is no concept of open. NFS performs write though a
>>>>> handle saying it needs to be a synchronous write, so write() syscall is
>>>>> performed first then it performs fsync(). so an write on an fd with O_SYNC
>>>>> becomes write+fsync. I am suspecting that when multiple threads do this
>>>>> write+fsync() operation on the same file, multiple writes are batched
>>>>> together to be written do disk so the throughput on the disk is increasing
>>>>> is my guess.
>>>>>
>>>>> Does it answer your doubts?
>>>>>
>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>>>> going faster than NFS:
>>>>>>
>>>>>> FUSE:
>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>>>> of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>>>
>>>>>>
>>>>>> NFS
>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
>>>>>> of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>> Could you let me know the speed without oflag=sync on both the
>>>>>> mounts? No need to collect profiles.
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Here is what I see now:
>>>>>>>
>>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>>
>>>>>>> Volume Name: data-volume
>>>>>>> Type: Distribute
>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>> Options Reconfigured:
>>>>>>> diagnostics.count-fop-hits: on
>>>>>>> diagnostics.latency-measurement: on
>>>>>>> nfs.exports-auth-enable: on
>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>> performance.readdir-ahead: on
>>>>>>> nfs.disable: on
>>>>>>> nfs.export-volumes: off
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>>>
>>>>>>> Is this the volume info you have?
>>>>>>>
>>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>>>> *>>* Volume Name: data-volume
>>>>>>> *>* Type: Distribute
>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> *>* Status: Started
>>>>>>> *>* Number of Bricks: 2
>>>>>>> *>* Transport-type: tcp
>>>>>>> *>* Bricks:
>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>>> *>* Options Reconfigured:
>>>>>>> *>* performance.readdir-ahead: on
>>>>>>> *>* nfs.disable: on
>>>>>>> *>* nfs.export-volumes: off
>>>>>>>
>>>>>>> *
>>>>>>>
>>>>>>> ​I copied this from old thread from 2016. This is distribute volume.
>>>>>>> Did you change any of the options in between?
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pranith Kumar Karampuri
2017-05-13 03:17:11 UTC
Reply
Permalink
Raw Message
On Sat, May 13, 2017 at 8:44 AM, Pranith Kumar Karampuri <
***@redhat.com> wrote:

>
>
> On Fri, May 12, 2017 at 8:04 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> My question was about setting up a gluster volume on an ext4 partition.
>> I thought we had the bricks mounted as xfs for compatibility with gluster?
>>
>
> Oh that should not be a problem. It works fine.
>

Just that xfs doesn't have limits for anything, where as ext4 does for
things like hardlinks etc(At least last time I checked :-) ). So it is
better to have xfs.


>
>
>>
>> Pat
>>
>>
>>
>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> The /home partition is mounted as ext4
>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>
>>> The brick partitions are mounted ax xfs
>>> /mnt/brick1 xfs defaults 0 0
>>> /mnt/brick2 xfs defaults 0 0
>>>
>>> Will this cause a problem with creating a volume under /home?
>>>
>>
>> I don't think the bottleneck is disk. You can do the same tests you did
>> on your new volume to confirm?
>>
>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Unfortunately, we don't have similar hardware for a small scale test.
>>>> All we have is our production hardware.
>>>>
>>>
>>> You said something about /home partition which has lesser disks, we can
>>> create plain distribute volume inside one of those directories. After we
>>> are done, we can remove the setup. What do you say?
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>>
>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Since we are mounting the partitions as the bricks, I tried the dd
>>>>> test writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not
>>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/
>>>>> fewer disks).
>>>>>
>>>>
>>>> Okay, then 1.6Gb/s is what we need to target for, considering your
>>>> volume is just distribute. Is there any way you can do tests on similar
>>>> hardware but at a small scale? Just so we can run the workload to learn
>>>> more about the bottlenecks in the system? We can probably try to get the
>>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let
>>>> me know if that is something you are okay to do.
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>>>>> answer by some other people who are more familiar with this.
>>>>>>
>>>>>> I am also uncertain about how to interpret the results when we also
>>>>>> add the dd tests writing to the /home area (no gluster, still on the same
>>>>>> machine)
>>>>>>
>>>>>> - dd test without oflag=sync (rough average of multiple tests)
>>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>>> - dd test with oflag=sync (rough average of multiple tests)
>>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>>> - nfs (no gluster): 20 Mb/s
>>>>>>
>>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each
>>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect
>>>>>> the writes to the gluster area to be roughly 8x faster than to the
>>>>>> non-gluster.
>>>>>>
>>>>>
>>>>> I think a better test is to try and write to a file using nfs without
>>>>> any gluster to a location that is not inside the brick but someother
>>>>> location that is on same disk(s). If you are mounting the partition as the
>>>>> brick, then we can write to a file inside .glusterfs directory, something
>>>>> like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>
>>>>>
>>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is
>>>>>> part of the problem.
>>>>>>
>>>>>
>>>>> I got interested in the post because I read that fuse speed is lesser
>>>>> than nfs speed which is counter-intuitive to my understanding. So wanted
>>>>> clarifications. Now that I got my clarifications where fuse outperformed
>>>>> nfs without sync, we can resume testing as described above and try to find
>>>>> what it is. Based on your email-id I am guessing you are from Boston and I
>>>>> am from Bangalore so if you are okay with doing this debugging for multiple
>>>>> days because of timezones, I will be happy to help. Please be a bit patient
>>>>> with me, I am under a release crunch but I am very curious with the problem
>>>>> you posted.
>>>>>
>>>>> Was there anything useful in the profiles?
>>>>>>
>>>>>
>>>>> Unfortunately profiles didn't help me much, I think we are collecting
>>>>> the profiles from an active volume, so it has a lot of information that is
>>>>> not pertaining to dd so it is difficult to find the contributions of dd. So
>>>>> I went through your post again and found something I didn't pay much
>>>>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
>>>>> FUSE so sent that reply.
>>>>>
>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in
>>>>>> gluster NFS and fuse is a bit different.
>>>>>> When application opens a file with O_SYNC on fuse mount then each
>>>>>> write syscall has to be written to disk as part of the syscall where as in
>>>>>> case of NFS, there is no concept of open. NFS performs write though a
>>>>>> handle saying it needs to be a synchronous write, so write() syscall is
>>>>>> performed first then it performs fsync(). so an write on an fd with O_SYNC
>>>>>> becomes write+fsync. I am suspecting that when multiple threads do this
>>>>>> write+fsync() operation on the same file, multiple writes are batched
>>>>>> together to be written do disk so the throughput on the disk is increasing
>>>>>> is my guess.
>>>>>>
>>>>>> Does it answer your doubts?
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>>>>> going faster than NFS:
>>>>>>>
>>>>>>> FUSE:
>>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>> of=zeros.txt conv=sync
>>>>>>> 4096+0 records in
>>>>>>> 4096+0 records out
>>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>>>>
>>>>>>>
>>>>>>> NFS
>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>> of=zeros.txt conv=sync
>>>>>>> 4096+0 records in
>>>>>>> 4096+0 records out
>>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>>>>
>>>>>>> Could you let me know the speed without oflag=sync on both the
>>>>>>> mounts? No need to collect profiles.
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Here is what I see now:
>>>>>>>>
>>>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>>>
>>>>>>>> Volume Name: data-volume
>>>>>>>> Type: Distribute
>>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>> Status: Started
>>>>>>>> Number of Bricks: 2
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>>> Options Reconfigured:
>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>> diagnostics.latency-measurement: on
>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> nfs.disable: on
>>>>>>>> nfs.export-volumes: off
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>
>>>>>>>> Is this the volume info you have?
>>>>>>>>
>>>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>>>>> *>>* Volume Name: data-volume
>>>>>>>> *>* Type: Distribute
>>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>> *>* Status: Started
>>>>>>>> *>* Number of Bricks: 2
>>>>>>>> *>* Transport-type: tcp
>>>>>>>> *>* Bricks:
>>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>>>> *>* Options Reconfigured:
>>>>>>>> *>* performance.readdir-ahead: on
>>>>>>>> *>* nfs.disable: on
>>>>>>>> *>* nfs.export-volumes: off
>>>>>>>>
>>>>>>>> *
>>>>>>>>
>>>>>>>> ​I copied this from old thread from 2016. This is distribute
>>>>>>>> volume. Did you change any of the options in between?
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>
>
> --
> Pranith
>



--
Pranith
Ben Turner
2017-05-15 01:24:53 UTC
Reply
Permalink
Raw Message
----- Original Message -----
> From: "Pranith Kumar Karampuri" <***@redhat.com>
> To: "Pat Haley" <***@mit.edu>
> Cc: gluster-***@gluster.org, "Steve Postma" <***@ztechnet.com>
> Sent: Friday, May 12, 2017 11:17:11 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
>
>
>
> On Sat, May 13, 2017 at 8:44 AM, Pranith Kumar Karampuri <
> ***@redhat.com > wrote:
>
>
>
>
>
> On Fri, May 12, 2017 at 8:04 PM, Pat Haley < ***@mit.edu > wrote:
>
>
>
>
> Hi Pranith,
>
> My question was about setting up a gluster volume on an ext4 partition. I
> thought we had the bricks mounted as xfs for compatibility with gluster?
>
> Oh that should not be a problem. It works fine.
>
> Just that xfs doesn't have limits for anything, where as ext4 does for things
> like hardlinks etc(At least last time I checked :-) ). So it is better to
> have xfs.

One of the biggest reasons to use XFS IMHO is that most of the testing / large scale deployments(at least that I know of) / etc are done using XFS as a backend. While EXT4 should work I don't think that it has the same level of testing as XFS.

-b



>
>
>
>
>
>
>
> Pat
>
>
>
> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>
>
>
>
>
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley < ***@mit.edu > wrote:
>
>
>
>
> Hi Pranith,
>
> The /home partition is mounted as ext4
> /home ext4 defaults,usrquota,grpquota 1 2
>
> The brick partitions are mounted ax xfs
> /mnt/brick1 xfs defaults 0 0
> /mnt/brick2 xfs defaults 0 0
>
> Will this cause a problem with creating a volume under /home?
>
> I don't think the bottleneck is disk. You can do the same tests you did on
> your new volume to confirm?
>
>
>
>
> Pat
>
>
>
> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>
>
>
>
>
> On Thu, May 11, 2017 at 8:57 PM, Pat Haley < ***@mit.edu > wrote:
>
>
>
>
> Hi Pranith,
>
> Unfortunately, we don't have similar hardware for a small scale test. All we
> have is our production hardware.
>
> You said something about /home partition which has lesser disks, we can
> create plain distribute volume inside one of those directories. After we are
> done, we can remove the setup. What do you say?
>
>
>
>
>
> Pat
>
>
>
>
> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>
>
>
>
>
> On Thu, May 11, 2017 at 2:48 AM, Pat Haley < ***@mit.edu > wrote:
>
>
>
>
> Hi Pranith,
>
> Since we are mounting the partitions as the bricks, I tried the dd test
> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
> results without oflag=sync were 1.6 Gb/s (faster than gluster but not as
> fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ fewer
> disks).
>
> Okay, then 1.6Gb/s is what we need to target for, considering your volume is
> just distribute. Is there any way you can do tests on similar hardware but
> at a small scale? Just so we can run the workload to learn more about the
> bottlenecks in the system? We can probably try to get the speed to 1.2Gb/s
> on your /home partition you were telling me yesterday. Let me know if that
> is something you are okay to do.
>
>
>
>
> Pat
>
>
>
> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>
>
>
>
>
> On Wed, May 10, 2017 at 10:15 PM, Pat Haley < ***@mit.edu > wrote:
>
>
>
>
> Hi Pranith,
>
> Not entirely sure (this isn't my area of expertise). I'll run your answer by
> some other people who are more familiar with this.
>
> I am also uncertain about how to interpret the results when we also add the
> dd tests writing to the /home area (no gluster, still on the same machine)
>
>
> * dd test without oflag=sync (rough average of multiple tests)
>
>
> * gluster w/ fuse mount : 570 Mb/s
> * gluster w/ nfs mount: 390 Mb/s
> * nfs (no gluster): 1.2 Gb/s
> * dd test with oflag=sync (rough average of multiple tests)
>
> * gluster w/ fuse mount: 5 Mb/s
> * gluster w/ nfs mount: 200 Mb/s
> * nfs (no gluster): 20 Mb/s
>
> Given that the non-gluster area is a RAID-6 of 4 disks while each brick of
> the gluster area is a RAID-6 of 32 disks, I would naively expect the writes
> to the gluster area to be roughly 8x faster than to the non-gluster.
>
> I think a better test is to try and write to a file using nfs without any
> gluster to a location that is not inside the brick but someother location
> that is on same disk(s). If you are mounting the partition as the brick,
> then we can write to a file inside .glusterfs directory, something like
> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>
>
>
>
>
> I still think we have a speed issue, I can't tell if fuse vs nfs is part of
> the problem.
>
> I got interested in the post because I read that fuse speed is lesser than
> nfs speed which is counter-intuitive to my understanding. So wanted
> clarifications. Now that I got my clarifications where fuse outperformed nfs
> without sync, we can resume testing as described above and try to find what
> it is. Based on your email-id I am guessing you are from Boston and I am
> from Bangalore so if you are okay with doing this debugging for multiple
> days because of timezones, I will be happy to help. Please be a bit patient
> with me, I am under a release crunch but I am very curious with the problem
> you posted.
>
>
>
>
> Was there anything useful in the profiles?
>
> Unfortunately profiles didn't help me much, I think we are collecting the
> profiles from an active volume, so it has a lot of information that is not
> pertaining to dd so it is difficult to find the contributions of dd. So I
> went through your post again and found something I didn't pay much attention
> to earlier i.e. oflag=sync, so did my own tests on my setup with FUSE so
> sent that reply.
>
>
>
>
>
> Pat
>
>
>
> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>
>
>
> Okay good. At least this validates my doubts. Handling O_SYNC in gluster NFS
> and fuse is a bit different.
> When application opens a file with O_SYNC on fuse mount then each write
> syscall has to be written to disk as part of the syscall where as in case of
> NFS, there is no concept of open. NFS performs write though a handle saying
> it needs to be a synchronous write, so write() syscall is performed first
> then it performs fsync(). so an write on an fd with O_SYNC becomes
> write+fsync. I am suspecting that when multiple threads do this
> write+fsync() operation on the same file, multiple writes are batched
> together to be written do disk so the throughput on the disk is increasing
> is my guess.
>
> Does it answer your doubts?
>
> On Wed, May 10, 2017 at 9:35 PM, Pat Haley < ***@mit.edu > wrote:
>
>
>
>
> Without the oflag=sync and only a single test of each, the FUSE is going
> faster than NFS:
>
> FUSE:
> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
> conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>
>
> NFS
> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
> conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>
>
>
> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>
>
>
> Could you let me know the speed without oflag=sync on both the mounts? No
> need to collect profiles.
>
> On Wed, May 10, 2017 at 9:17 PM, Pat Haley < ***@mit.edu > wrote:
>
>
>
>
> Here is what I see now:
>
> [***@mseas-data2 ~]# gluster volume info
>
> Volume Name: data-volume
> Type: Distribute
> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: mseas-data2:/mnt/brick1
> Brick2: mseas-data2:/mnt/brick2
> Options Reconfigured:
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> nfs.exports-auth-enable: on
> diagnostics.brick-sys-log-level: WARNING
> performance.readdir-ahead: on
> nfs.disable: on
> nfs.export-volumes: off
>
>
>
> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>
>
>
> Is this the volume info you have?
>
> > [ root at mseas-data2 ~]# gluster volume info > > Volume Name: data-volume
> > > Type: Distribute > Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >
> > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: >
> > Brick1: mseas-data2:/mnt/brick1 > Brick2: mseas-data2:/mnt/brick2 >
> > Options Reconfigured: > performance.readdir-ahead: on > nfs.disable: on >
> > nfs.export-volumes: off
> ​I copied this from old thread from 2016. This is distribute volume. Did you
> change any of the options in between?
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu Center for Ocean
> Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
> --
> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu Center for Ocean
> Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
> --
> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu Center for Ocean
> Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
> --
> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu Center for Ocean
> Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
> --
> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu Center for Ocean
> Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
> --
> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu Center for Ocean
> Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
> --
> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu Center for Ocean
> Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
> --
> Pranith
>
>
>
> --
> Pranith
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
Pat Haley
2017-05-16 15:50:35 UTC
Reply
Permalink
Raw Message
Hi Pranith,

Sorry for the delay. I never saw received your reply (but I did receive
Ben Turner's follow-up to your reply). So we tried to create a gluster
volume under /home using different variations of

gluster volume create test-volume mseas-data2:/home/gbrick_test_1
mseas-data2:/home/gbrick_test_2 transport tcp

However we keep getting errors of the form

Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>

Any thoughts on what we're doing wrong?

Also do you have a list of the test we should be running once we get
this volume created? Given the time-zone difference it might help if we
can run a small battery of tests and post the results rather than
test-post-new test-post... .

Thanks

Pat


On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>
>
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> The /home partition is mounted as ext4
> /home ext4 defaults,usrquota,grpquota 1 2
>
> The brick partitions are mounted ax xfs
> /mnt/brick1 xfs defaults 0 0
> /mnt/brick2 xfs defaults 0 0
>
> Will this cause a problem with creating a volume under /home?
>
>
> I don't think the bottleneck is disk. You can do the same tests you
> did on your new volume to confirm?
>
>
> Pat
>
>
>
> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Unfortunately, we don't have similar hardware for a small
>> scale test. All we have is our production hardware.
>>
>>
>> You said something about /home partition which has lesser disks,
>> we can create plain distribute volume inside one of those
>> directories. After we are done, we can remove the setup. What do
>> you say?
>>
>>
>> Pat
>>
>>
>>
>>
>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Since we are mounting the partitions as the bricks, I
>>> tried the dd test writing to
>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>> The results without oflag=sync were 1.6 Gb/s (faster
>>> than gluster but not as fast as I was expecting given
>>> the 1.2 Gb/s to the no-gluster area w/ fewer disks).
>>>
>>>
>>> Okay, then 1.6Gb/s is what we need to target for,
>>> considering your volume is just distribute. Is there any way
>>> you can do tests on similar hardware but at a small scale?
>>> Just so we can run the workload to learn more about the
>>> bottlenecks in the system? We can probably try to get the
>>> speed to 1.2Gb/s on your /home partition you were telling me
>>> yesterday. Let me know if that is something you are okay to do.
>>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Not entirely sure (this isn't my area of
>>>> expertise). I'll run your answer by some other
>>>> people who are more familiar with this.
>>>>
>>>> I am also uncertain about how to interpret the
>>>> results when we also add the dd tests writing to
>>>> the /home area (no gluster, still on the same machine)
>>>>
>>>> * dd test without oflag=sync (rough average of
>>>> multiple tests)
>>>> o gluster w/ fuse mount : 570 Mb/s
>>>> o gluster w/ nfs mount: 390 Mb/s
>>>> o nfs (no gluster): 1.2 Gb/s
>>>> * dd test with oflag=sync (rough average of
>>>> multiple tests)
>>>> o gluster w/ fuse mount: 5 Mb/s
>>>> o gluster w/ nfs mount: 200 Mb/s
>>>> o nfs (no gluster): 20 Mb/s
>>>>
>>>> Given that the non-gluster area is a RAID-6 of 4
>>>> disks while each brick of the gluster area is a
>>>> RAID-6 of 32 disks, I would naively expect the
>>>> writes to the gluster area to be roughly 8x faster
>>>> than to the non-gluster.
>>>>
>>>>
>>>> I think a better test is to try and write to a file
>>>> using nfs without any gluster to a location that is not
>>>> inside the brick but someother location that is on same
>>>> disk(s). If you are mounting the partition as the
>>>> brick, then we can write to a file inside .glusterfs
>>>> directory, something like
>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>
>>>>
>>>> I still think we have a speed issue, I can't tell
>>>> if fuse vs nfs is part of the problem.
>>>>
>>>>
>>>> I got interested in the post because I read that fuse
>>>> speed is lesser than nfs speed which is
>>>> counter-intuitive to my understanding. So wanted
>>>> clarifications. Now that I got my clarifications where
>>>> fuse outperformed nfs without sync, we can resume
>>>> testing as described above and try to find what it is.
>>>> Based on your email-id I am guessing you are from
>>>> Boston and I am from Bangalore so if you are okay with
>>>> doing this debugging for multiple days because of
>>>> timezones, I will be happy to help. Please be a bit
>>>> patient with me, I am under a release crunch but I am
>>>> very curious with the problem you posted.
>>>>
>>>> Was there anything useful in the profiles?
>>>>
>>>>
>>>> Unfortunately profiles didn't help me much, I think we
>>>> are collecting the profiles from an active volume, so
>>>> it has a lot of information that is not pertaining to
>>>> dd so it is difficult to find the contributions of dd.
>>>> So I went through your post again and found something I
>>>> didn't pay much attention to earlier i.e. oflag=sync,
>>>> so did my own tests on my setup with FUSE so sent that
>>>> reply.
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>> Okay good. At least this validates my doubts.
>>>>> Handling O_SYNC in gluster NFS and fuse is a bit
>>>>> different.
>>>>> When application opens a file with O_SYNC on fuse
>>>>> mount then each write syscall has to be written to
>>>>> disk as part of the syscall where as in case of
>>>>> NFS, there is no concept of open. NFS performs
>>>>> write though a handle saying it needs to be a
>>>>> synchronous write, so write() syscall is performed
>>>>> first then it performs fsync(). so an write on an
>>>>> fd with O_SYNC becomes write+fsync. I am
>>>>> suspecting that when multiple threads do this
>>>>> write+fsync() operation on the same file, multiple
>>>>> writes are batched together to be written do disk
>>>>> so the throughput on the disk is increasing is my
>>>>> guess.
>>>>>
>>>>> Does it answer your doubts?
>>>>>
>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Without the oflag=sync and only a single test
>>>>> of each, the FUSE is going faster than NFS:
>>>>>
>>>>> FUSE:
>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero
>>>>> count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s,
>>>>> 575 MB/s
>>>>>
>>>>>
>>>>> NFS
>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096
>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s,
>>>>> 376 MB/s
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 11:53 AM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>> Could you let me know the speed without
>>>>>> oflag=sync on both the mounts? No need to
>>>>>> collect profiles.
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Here is what I see now:
>>>>>>
>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>
>>>>>> Volume Name: data-volume
>>>>>> Type: Distribute
>>>>>> Volume ID:
>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>> Status: Started
>>>>>> Number of Bricks: 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>> Options Reconfigured:
>>>>>> diagnostics.count-fop-hits: on
>>>>>> diagnostics.latency-measurement: on
>>>>>> nfs.exports-auth-enable: on
>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>> performance.readdir-ahead: on
>>>>>> nfs.disable: on
>>>>>> nfs.export-volumes: off
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>> Is this the volume info you have?
>>>>>>>
>>>>>>> >/[root at mseas-data2
>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>> ​I copied this from old thread from
>>>>>>> 2016. This is distribute volume. Did you
>>>>>>> change any of the options in between?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-17 09:01:04 UTC
Reply
Permalink
Raw Message
On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> Sorry for the delay. I never saw received your reply (but I did receive
> Ben Turner's follow-up to your reply). So we tried to create a gluster
> volume under /home using different variations of
>
> gluster volume create test-volume mseas-data2:/home/gbrick_test_1
> mseas-data2:/home/gbrick_test_2 transport tcp
>
> However we keep getting errors of the form
>
> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>
> Any thoughts on what we're doing wrong?
>

You should give transport tcp at the beginning I think. Anyways, transport
tcp is the default, so no need to specify so remove those two words from
the CLI.

>
> Also do you have a list of the test we should be running once we get this
> volume created? Given the time-zone difference it might help if we can run
> a small battery of tests and post the results rather than test-post-new
> test-post... .
>

This is the first time I am doing performance analysis on users as far as I
remember. In our team there are separate engineers who do these tests. Ben
who replied earlier is one such engineer.

Ben,
Have any suggestions?


>
> Thanks
>
> Pat
>
>
>
> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>
>
>
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> The /home partition is mounted as ext4
>> /home ext4 defaults,usrquota,grpquota 1 2
>>
>> The brick partitions are mounted ax xfs
>> /mnt/brick1 xfs defaults 0 0
>> /mnt/brick2 xfs defaults 0 0
>>
>> Will this cause a problem with creating a volume under /home?
>>
>
> I don't think the bottleneck is disk. You can do the same tests you did on
> your new volume to confirm?
>
>
>>
>> Pat
>>
>>
>>
>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Unfortunately, we don't have similar hardware for a small scale test.
>>> All we have is our production hardware.
>>>
>>
>> You said something about /home partition which has lesser disks, we can
>> create plain distribute volume inside one of those directories. After we
>> are done, we can remove the setup. What do you say?
>>
>>
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Since we are mounting the partitions as the bricks, I tried the dd test
>>>> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not
>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/
>>>> fewer disks).
>>>>
>>>
>>> Okay, then 1.6Gb/s is what we need to target for, considering your
>>> volume is just distribute. Is there any way you can do tests on similar
>>> hardware but at a small scale? Just so we can run the workload to learn
>>> more about the bottlenecks in the system? We can probably try to get the
>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let
>>> me know if that is something you are okay to do.
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>>>> answer by some other people who are more familiar with this.
>>>>>
>>>>> I am also uncertain about how to interpret the results when we also
>>>>> add the dd tests writing to the /home area (no gluster, still on the same
>>>>> machine)
>>>>>
>>>>> - dd test without oflag=sync (rough average of multiple tests)
>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>> - dd test with oflag=sync (rough average of multiple tests)
>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>> - nfs (no gluster): 20 Mb/s
>>>>>
>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each
>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect
>>>>> the writes to the gluster area to be roughly 8x faster than to the
>>>>> non-gluster.
>>>>>
>>>>
>>>> I think a better test is to try and write to a file using nfs without
>>>> any gluster to a location that is not inside the brick but someother
>>>> location that is on same disk(s). If you are mounting the partition as the
>>>> brick, then we can write to a file inside .glusterfs directory, something
>>>> like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>
>>>>
>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is
>>>>> part of the problem.
>>>>>
>>>>
>>>> I got interested in the post because I read that fuse speed is lesser
>>>> than nfs speed which is counter-intuitive to my understanding. So wanted
>>>> clarifications. Now that I got my clarifications where fuse outperformed
>>>> nfs without sync, we can resume testing as described above and try to find
>>>> what it is. Based on your email-id I am guessing you are from Boston and I
>>>> am from Bangalore so if you are okay with doing this debugging for multiple
>>>> days because of timezones, I will be happy to help. Please be a bit patient
>>>> with me, I am under a release crunch but I am very curious with the problem
>>>> you posted.
>>>>
>>>> Was there anything useful in the profiles?
>>>>>
>>>>
>>>> Unfortunately profiles didn't help me much, I think we are collecting
>>>> the profiles from an active volume, so it has a lot of information that is
>>>> not pertaining to dd so it is difficult to find the contributions of dd. So
>>>> I went through your post again and found something I didn't pay much
>>>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
>>>> FUSE so sent that reply.
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in
>>>>> gluster NFS and fuse is a bit different.
>>>>> When application opens a file with O_SYNC on fuse mount then each
>>>>> write syscall has to be written to disk as part of the syscall where as in
>>>>> case of NFS, there is no concept of open. NFS performs write though a
>>>>> handle saying it needs to be a synchronous write, so write() syscall is
>>>>> performed first then it performs fsync(). so an write on an fd with O_SYNC
>>>>> becomes write+fsync. I am suspecting that when multiple threads do this
>>>>> write+fsync() operation on the same file, multiple writes are batched
>>>>> together to be written do disk so the throughput on the disk is increasing
>>>>> is my guess.
>>>>>
>>>>> Does it answer your doubts?
>>>>>
>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>>>> going faster than NFS:
>>>>>>
>>>>>> FUSE:
>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>>>> of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>>>
>>>>>>
>>>>>> NFS
>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
>>>>>> of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>> Could you let me know the speed without oflag=sync on both the
>>>>>> mounts? No need to collect profiles.
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Here is what I see now:
>>>>>>>
>>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>>
>>>>>>> Volume Name: data-volume
>>>>>>> Type: Distribute
>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>> Options Reconfigured:
>>>>>>> diagnostics.count-fop-hits: on
>>>>>>> diagnostics.latency-measurement: on
>>>>>>> nfs.exports-auth-enable: on
>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>> performance.readdir-ahead: on
>>>>>>> nfs.disable: on
>>>>>>> nfs.export-volumes: off
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>>>
>>>>>>> Is this the volume info you have?
>>>>>>>
>>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>>>> *>>* Volume Name: data-volume
>>>>>>> *>* Type: Distribute
>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> *>* Status: Started
>>>>>>> *>* Number of Bricks: 2
>>>>>>> *>* Transport-type: tcp
>>>>>>> *>* Bricks:
>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>>> *>* Options Reconfigured:
>>>>>>> *>* performance.readdir-ahead: on
>>>>>>> *>* nfs.disable: on
>>>>>>> *>* nfs.export-volumes: off
>>>>>>>
>>>>>>> *
>>>>>>>
>>>>>>> ​I copied this from old thread from 2016. This is distribute volume.
>>>>>>> Did you change any of the options in between?
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-30 15:46:18 UTC
Reply
Permalink
Raw Message
Hi Pranith,

Thanks for the tip. We now have the gluster volume mounted under
/home. What tests do you recommend we run?

Thanks

Pat


On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>
>
> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> Sorry for the delay. I never saw received your reply (but I did
> receive Ben Turner's follow-up to your reply). So we tried to
> create a gluster volume under /home using different variations of
>
> gluster volume create test-volume mseas-data2:/home/gbrick_test_1
> mseas-data2:/home/gbrick_test_2 transport tcp
>
> However we keep getting errors of the form
>
> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>
> Any thoughts on what we're doing wrong?
>
>
> You should give transport tcp at the beginning I think. Anyways,
> transport tcp is the default, so no need to specify so remove those
> two words from the CLI.
>
>
> Also do you have a list of the test we should be running once we
> get this volume created? Given the time-zone difference it might
> help if we can run a small battery of tests and post the results
> rather than test-post-new test-post... .
>
>
> This is the first time I am doing performance analysis on users as far
> as I remember. In our team there are separate engineers who do these
> tests. Ben who replied earlier is one such engineer.
>
> Ben,
> Have any suggestions?
>
>
> Thanks
>
> Pat
>
>
>
> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> The /home partition is mounted as ext4
>> /home ext4 defaults,usrquota,grpquota 1 2
>>
>> The brick partitions are mounted ax xfs
>> /mnt/brick1 xfs defaults 0 0
>> /mnt/brick2 xfs defaults 0 0
>>
>> Will this cause a problem with creating a volume under /home?
>>
>>
>> I don't think the bottleneck is disk. You can do the same tests
>> you did on your new volume to confirm?
>>
>>
>> Pat
>>
>>
>>
>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Unfortunately, we don't have similar hardware for a
>>> small scale test. All we have is our production hardware.
>>>
>>>
>>> You said something about /home partition which has lesser
>>> disks, we can create plain distribute volume inside one of
>>> those directories. After we are done, we can remove the
>>> setup. What do you say?
>>>
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Since we are mounting the partitions as the bricks,
>>>> I tried the dd test writing to
>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>> The results without oflag=sync were 1.6 Gb/s
>>>> (faster than gluster but not as fast as I was
>>>> expecting given the 1.2 Gb/s to the no-gluster area
>>>> w/ fewer disks).
>>>>
>>>>
>>>> Okay, then 1.6Gb/s is what we need to target for,
>>>> considering your volume is just distribute. Is there
>>>> any way you can do tests on similar hardware but at a
>>>> small scale? Just so we can run the workload to learn
>>>> more about the bottlenecks in the system? We can
>>>> probably try to get the speed to 1.2Gb/s on your /home
>>>> partition you were telling me yesterday. Let me know if
>>>> that is something you are okay to do.
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Not entirely sure (this isn't my area of
>>>>> expertise). I'll run your answer by some other
>>>>> people who are more familiar with this.
>>>>>
>>>>> I am also uncertain about how to interpret the
>>>>> results when we also add the dd tests writing
>>>>> to the /home area (no gluster, still on the
>>>>> same machine)
>>>>>
>>>>> * dd test without oflag=sync (rough average
>>>>> of multiple tests)
>>>>> o gluster w/ fuse mount : 570 Mb/s
>>>>> o gluster w/ nfs mount: 390 Mb/s
>>>>> o nfs (no gluster): 1.2 Gb/s
>>>>> * dd test with oflag=sync (rough average of
>>>>> multiple tests)
>>>>> o gluster w/ fuse mount: 5 Mb/s
>>>>> o gluster w/ nfs mount: 200 Mb/s
>>>>> o nfs (no gluster): 20 Mb/s
>>>>>
>>>>> Given that the non-gluster area is a RAID-6 of
>>>>> 4 disks while each brick of the gluster area
>>>>> is a RAID-6 of 32 disks, I would naively
>>>>> expect the writes to the gluster area to be
>>>>> roughly 8x faster than to the non-gluster.
>>>>>
>>>>>
>>>>> I think a better test is to try and write to a
>>>>> file using nfs without any gluster to a location
>>>>> that is not inside the brick but someother
>>>>> location that is on same disk(s). If you are
>>>>> mounting the partition as the brick, then we can
>>>>> write to a file inside .glusterfs directory,
>>>>> something like
>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>
>>>>>
>>>>>
>>>>> I still think we have a speed issue, I can't
>>>>> tell if fuse vs nfs is part of the problem.
>>>>>
>>>>>
>>>>> I got interested in the post because I read that
>>>>> fuse speed is lesser than nfs speed which is
>>>>> counter-intuitive to my understanding. So wanted
>>>>> clarifications. Now that I got my clarifications
>>>>> where fuse outperformed nfs without sync, we can
>>>>> resume testing as described above and try to find
>>>>> what it is. Based on your email-id I am guessing
>>>>> you are from Boston and I am from Bangalore so if
>>>>> you are okay with doing this debugging for
>>>>> multiple days because of timezones, I will be
>>>>> happy to help. Please be a bit patient with me, I
>>>>> am under a release crunch but I am very curious
>>>>> with the problem you posted.
>>>>>
>>>>> Was there anything useful in the profiles?
>>>>>
>>>>>
>>>>> Unfortunately profiles didn't help me much, I
>>>>> think we are collecting the profiles from an
>>>>> active volume, so it has a lot of information that
>>>>> is not pertaining to dd so it is difficult to find
>>>>> the contributions of dd. So I went through your
>>>>> post again and found something I didn't pay much
>>>>> attention to earlier i.e. oflag=sync, so did my
>>>>> own tests on my setup with FUSE so sent that reply.
>>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 12:15 PM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>> Okay good. At least this validates my doubts.
>>>>>> Handling O_SYNC in gluster NFS and fuse is a
>>>>>> bit different.
>>>>>> When application opens a file with O_SYNC on
>>>>>> fuse mount then each write syscall has to be
>>>>>> written to disk as part of the syscall where
>>>>>> as in case of NFS, there is no concept of
>>>>>> open. NFS performs write though a handle
>>>>>> saying it needs to be a synchronous write, so
>>>>>> write() syscall is performed first then it
>>>>>> performs fsync(). so an write on an fd with
>>>>>> O_SYNC becomes write+fsync. I am suspecting
>>>>>> that when multiple threads do this
>>>>>> write+fsync() operation on the same file,
>>>>>> multiple writes are batched together to be
>>>>>> written do disk so the throughput on the disk
>>>>>> is increasing is my guess.
>>>>>>
>>>>>> Does it answer your doubts?
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Without the oflag=sync and only a single
>>>>>> test of each, the FUSE is going faster
>>>>>> than NFS:
>>>>>>
>>>>>> FUSE:
>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero
>>>>>> count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961
>>>>>> s, 575 MB/s
>>>>>>
>>>>>>
>>>>>> NFS
>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero
>>>>>> count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264
>>>>>> s, 376 MB/s
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>> Could you let me know the speed without
>>>>>>> oflag=sync on both the mounts? No need
>>>>>>> to collect profiles.
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat
>>>>>>> Haley <***@mit.edu
>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Here is what I see now:
>>>>>>>
>>>>>>> [***@mseas-data2 ~]# gluster volume
>>>>>>> info
>>>>>>>
>>>>>>> Volume Name: data-volume
>>>>>>> Type: Distribute
>>>>>>> Volume ID:
>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>> Options Reconfigured:
>>>>>>> diagnostics.count-fop-hits: on
>>>>>>> diagnostics.latency-measurement: on
>>>>>>> nfs.exports-auth-enable: on
>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>> performance.readdir-ahead: on
>>>>>>> nfs.disable: on
>>>>>>> nfs.export-volumes: off
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 11:44 AM, Pranith
>>>>>>> Kumar Karampuri wrote:
>>>>>>>> Is this the volume info you have?
>>>>>>>>
>>>>>>>> >/[root at mseas-data2
>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>> ​I copied this from old thread from
>>>>>>>> 2016. This is distribute volume.
>>>>>>>> Did you change any of the options
>>>>>>>> in between?
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
>
> --
> Pranith

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-30 16:10:56 UTC
Reply
Permalink
Raw Message
Let's start with the same 'dd' test we were testing with to see, what the
numbers are. Please provide profile numbers for the same. From there on we
will start tuning the volume to see what we can do.

On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> Thanks for the tip. We now have the gluster volume mounted under /home.
> What tests do you recommend we run?
>
> Thanks
>
> Pat
>
>
>
> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>
>
>
> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> Sorry for the delay. I never saw received your reply (but I did receive
>> Ben Turner's follow-up to your reply). So we tried to create a gluster
>> volume under /home using different variations of
>>
>> gluster volume create test-volume mseas-data2:/home/gbrick_test_1
>> mseas-data2:/home/gbrick_test_2 transport tcp
>>
>> However we keep getting errors of the form
>>
>> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>>
>> Any thoughts on what we're doing wrong?
>>
>
> You should give transport tcp at the beginning I think. Anyways, transport
> tcp is the default, so no need to specify so remove those two words from
> the CLI.
>
>>
>> Also do you have a list of the test we should be running once we get this
>> volume created? Given the time-zone difference it might help if we can run
>> a small battery of tests and post the results rather than test-post-new
>> test-post... .
>>
>
> This is the first time I am doing performance analysis on users as far as
> I remember. In our team there are separate engineers who do these tests.
> Ben who replied earlier is one such engineer.
>
> Ben,
> Have any suggestions?
>
>
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> The /home partition is mounted as ext4
>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>
>>> The brick partitions are mounted ax xfs
>>> /mnt/brick1 xfs defaults 0 0
>>> /mnt/brick2 xfs defaults 0 0
>>>
>>> Will this cause a problem with creating a volume under /home?
>>>
>>
>> I don't think the bottleneck is disk. You can do the same tests you did
>> on your new volume to confirm?
>>
>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Unfortunately, we don't have similar hardware for a small scale test.
>>>> All we have is our production hardware.
>>>>
>>>
>>> You said something about /home partition which has lesser disks, we can
>>> create plain distribute volume inside one of those directories. After we
>>> are done, we can remove the setup. What do you say?
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>>
>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Since we are mounting the partitions as the bricks, I tried the dd
>>>>> test writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not
>>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/
>>>>> fewer disks).
>>>>>
>>>>
>>>> Okay, then 1.6Gb/s is what we need to target for, considering your
>>>> volume is just distribute. Is there any way you can do tests on similar
>>>> hardware but at a small scale? Just so we can run the workload to learn
>>>> more about the bottlenecks in the system? We can probably try to get the
>>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let
>>>> me know if that is something you are okay to do.
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>>>>> answer by some other people who are more familiar with this.
>>>>>>
>>>>>> I am also uncertain about how to interpret the results when we also
>>>>>> add the dd tests writing to the /home area (no gluster, still on the same
>>>>>> machine)
>>>>>>
>>>>>> - dd test without oflag=sync (rough average of multiple tests)
>>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>>> - dd test with oflag=sync (rough average of multiple tests)
>>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>>> - nfs (no gluster): 20 Mb/s
>>>>>>
>>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each
>>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect
>>>>>> the writes to the gluster area to be roughly 8x faster than to the
>>>>>> non-gluster.
>>>>>>
>>>>>
>>>>> I think a better test is to try and write to a file using nfs without
>>>>> any gluster to a location that is not inside the brick but someother
>>>>> location that is on same disk(s). If you are mounting the partition as the
>>>>> brick, then we can write to a file inside .glusterfs directory, something
>>>>> like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>
>>>>>
>>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is
>>>>>> part of the problem.
>>>>>>
>>>>>
>>>>> I got interested in the post because I read that fuse speed is lesser
>>>>> than nfs speed which is counter-intuitive to my understanding. So wanted
>>>>> clarifications. Now that I got my clarifications where fuse outperformed
>>>>> nfs without sync, we can resume testing as described above and try to find
>>>>> what it is. Based on your email-id I am guessing you are from Boston and I
>>>>> am from Bangalore so if you are okay with doing this debugging for multiple
>>>>> days because of timezones, I will be happy to help. Please be a bit patient
>>>>> with me, I am under a release crunch but I am very curious with the problem
>>>>> you posted.
>>>>>
>>>>> Was there anything useful in the profiles?
>>>>>>
>>>>>
>>>>> Unfortunately profiles didn't help me much, I think we are collecting
>>>>> the profiles from an active volume, so it has a lot of information that is
>>>>> not pertaining to dd so it is difficult to find the contributions of dd. So
>>>>> I went through your post again and found something I didn't pay much
>>>>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
>>>>> FUSE so sent that reply.
>>>>>
>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in
>>>>>> gluster NFS and fuse is a bit different.
>>>>>> When application opens a file with O_SYNC on fuse mount then each
>>>>>> write syscall has to be written to disk as part of the syscall where as in
>>>>>> case of NFS, there is no concept of open. NFS performs write though a
>>>>>> handle saying it needs to be a synchronous write, so write() syscall is
>>>>>> performed first then it performs fsync(). so an write on an fd with O_SYNC
>>>>>> becomes write+fsync. I am suspecting that when multiple threads do this
>>>>>> write+fsync() operation on the same file, multiple writes are batched
>>>>>> together to be written do disk so the throughput on the disk is increasing
>>>>>> is my guess.
>>>>>>
>>>>>> Does it answer your doubts?
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>>>>> going faster than NFS:
>>>>>>>
>>>>>>> FUSE:
>>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>> of=zeros.txt conv=sync
>>>>>>> 4096+0 records in
>>>>>>> 4096+0 records out
>>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>>>>
>>>>>>>
>>>>>>> NFS
>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>> of=zeros.txt conv=sync
>>>>>>> 4096+0 records in
>>>>>>> 4096+0 records out
>>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>>>>
>>>>>>> Could you let me know the speed without oflag=sync on both the
>>>>>>> mounts? No need to collect profiles.
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Here is what I see now:
>>>>>>>>
>>>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>>>
>>>>>>>> Volume Name: data-volume
>>>>>>>> Type: Distribute
>>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>> Status: Started
>>>>>>>> Number of Bricks: 2
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>>> Options Reconfigured:
>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>> diagnostics.latency-measurement: on
>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> nfs.disable: on
>>>>>>>> nfs.export-volumes: off
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>
>>>>>>>> Is this the volume info you have?
>>>>>>>>
>>>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>>>>> *>>* Volume Name: data-volume
>>>>>>>> *>* Type: Distribute
>>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>> *>* Status: Started
>>>>>>>> *>* Number of Bricks: 2
>>>>>>>> *>* Transport-type: tcp
>>>>>>>> *>* Bricks:
>>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>>>> *>* Options Reconfigured:
>>>>>>>> *>* performance.readdir-ahead: on
>>>>>>>> *>* nfs.disable: on
>>>>>>>> *>* nfs.export-volumes: off
>>>>>>>>
>>>>>>>> *
>>>>>>>>
>>>>>>>> ​I copied this from old thread from 2016. This is distribute
>>>>>>>> volume. Did you change any of the options in between?
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>
>
> --
> Pranith
>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-30 17:06:51 UTC
Reply
Permalink
Raw Message
Hi Pranith,

I ran the same 'dd' test both in the gluster test volume and in the
.glusterfs directory of each brick. The median results (12 dd trials in
each test) are similar to before

* gluster test volume: 586.5 MB/s
* bricks (in .glusterfs): 1.4 GB/s

The profile for the gluster test-volume is in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt

Thanks

Pat



On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> Let's start with the same 'dd' test we were testing with to see, what
> the numbers are. Please provide profile numbers for the same. From
> there on we will start tuning the volume to see what we can do.
>
> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> Thanks for the tip. We now have the gluster volume mounted under
> /home. What tests do you recommend we run?
>
> Thanks
>
> Pat
>
>
>
> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Sorry for the delay. I never saw received your reply (but I
>> did receive Ben Turner's follow-up to your reply). So we
>> tried to create a gluster volume under /home using different
>> variations of
>>
>> gluster volume create test-volume
>> mseas-data2:/home/gbrick_test_1
>> mseas-data2:/home/gbrick_test_2 transport tcp
>>
>> However we keep getting errors of the form
>>
>> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>>
>> Any thoughts on what we're doing wrong?
>>
>>
>> You should give transport tcp at the beginning I think. Anyways,
>> transport tcp is the default, so no need to specify so remove
>> those two words from the CLI.
>>
>>
>> Also do you have a list of the test we should be running once
>> we get this volume created? Given the time-zone difference
>> it might help if we can run a small battery of tests and post
>> the results rather than test-post-new test-post... .
>>
>>
>> This is the first time I am doing performance analysis on users
>> as far as I remember. In our team there are separate engineers
>> who do these tests. Ben who replied earlier is one such engineer.
>>
>> Ben,
>> Have any suggestions?
>>
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> The /home partition is mounted as ext4
>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>
>>> The brick partitions are mounted ax xfs
>>> /mnt/brick1 xfs defaults 0 0
>>> /mnt/brick2 xfs defaults 0 0
>>>
>>> Will this cause a problem with creating a volume under
>>> /home?
>>>
>>>
>>> I don't think the bottleneck is disk. You can do the same
>>> tests you did on your new volume to confirm?
>>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Unfortunately, we don't have similar hardware for a
>>>> small scale test. All we have is our production
>>>> hardware.
>>>>
>>>>
>>>> You said something about /home partition which has
>>>> lesser disks, we can create plain distribute volume
>>>> inside one of those directories. After we are done, we
>>>> can remove the setup. What do you say?
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>>
>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Since we are mounting the partitions as the
>>>>> bricks, I tried the dd test writing to
>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>> The results without oflag=sync were 1.6 Gb/s
>>>>> (faster than gluster but not as fast as I was
>>>>> expecting given the 1.2 Gb/s to the no-gluster
>>>>> area w/ fewer disks).
>>>>>
>>>>>
>>>>> Okay, then 1.6Gb/s is what we need to target for,
>>>>> considering your volume is just distribute. Is
>>>>> there any way you can do tests on similar hardware
>>>>> but at a small scale? Just so we can run the
>>>>> workload to learn more about the bottlenecks in
>>>>> the system? We can probably try to get the speed
>>>>> to 1.2Gb/s on your /home partition you were
>>>>> telling me yesterday. Let me know if that is
>>>>> something you are okay to do.
>>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Not entirely sure (this isn't my area of
>>>>>> expertise). I'll run your answer by some
>>>>>> other people who are more familiar with this.
>>>>>>
>>>>>> I am also uncertain about how to
>>>>>> interpret the results when we also add
>>>>>> the dd tests writing to the /home area
>>>>>> (no gluster, still on the same machine)
>>>>>>
>>>>>> * dd test without oflag=sync (rough
>>>>>> average of multiple tests)
>>>>>> o gluster w/ fuse mount : 570 Mb/s
>>>>>> o gluster w/ nfs mount: 390 Mb/s
>>>>>> o nfs (no gluster): 1.2 Gb/s
>>>>>> * dd test with oflag=sync (rough
>>>>>> average of multiple tests)
>>>>>> o gluster w/ fuse mount: 5 Mb/s
>>>>>> o gluster w/ nfs mount: 200 Mb/s
>>>>>> o nfs (no gluster): 20 Mb/s
>>>>>>
>>>>>> Given that the non-gluster area is a
>>>>>> RAID-6 of 4 disks while each brick of the
>>>>>> gluster area is a RAID-6 of 32 disks, I
>>>>>> would naively expect the writes to the
>>>>>> gluster area to be roughly 8x faster than
>>>>>> to the non-gluster.
>>>>>>
>>>>>>
>>>>>> I think a better test is to try and write to
>>>>>> a file using nfs without any gluster to a
>>>>>> location that is not inside the brick but
>>>>>> someother location that is on same disk(s).
>>>>>> If you are mounting the partition as the
>>>>>> brick, then we can write to a file inside
>>>>>> .glusterfs directory, something like
>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I still think we have a speed issue, I
>>>>>> can't tell if fuse vs nfs is part of the
>>>>>> problem.
>>>>>>
>>>>>>
>>>>>> I got interested in the post because I read
>>>>>> that fuse speed is lesser than nfs speed
>>>>>> which is counter-intuitive to my
>>>>>> understanding. So wanted clarifications. Now
>>>>>> that I got my clarifications where fuse
>>>>>> outperformed nfs without sync, we can resume
>>>>>> testing as described above and try to find
>>>>>> what it is. Based on your email-id I am
>>>>>> guessing you are from Boston and I am from
>>>>>> Bangalore so if you are okay with doing this
>>>>>> debugging for multiple days because of
>>>>>> timezones, I will be happy to help. Please be
>>>>>> a bit patient with me, I am under a release
>>>>>> crunch but I am very curious with the problem
>>>>>> you posted.
>>>>>>
>>>>>> Was there anything useful in the profiles?
>>>>>>
>>>>>>
>>>>>> Unfortunately profiles didn't help me much, I
>>>>>> think we are collecting the profiles from an
>>>>>> active volume, so it has a lot of information
>>>>>> that is not pertaining to dd so it is
>>>>>> difficult to find the contributions of dd. So
>>>>>> I went through your post again and found
>>>>>> something I didn't pay much attention to
>>>>>> earlier i.e. oflag=sync, so did my own tests
>>>>>> on my setup with FUSE so sent that reply.
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>> Okay good. At least this validates my
>>>>>>> doubts. Handling O_SYNC in gluster NFS
>>>>>>> and fuse is a bit different.
>>>>>>> When application opens a file with
>>>>>>> O_SYNC on fuse mount then each write
>>>>>>> syscall has to be written to disk as
>>>>>>> part of the syscall where as in case of
>>>>>>> NFS, there is no concept of open. NFS
>>>>>>> performs write though a handle saying it
>>>>>>> needs to be a synchronous write, so
>>>>>>> write() syscall is performed first then
>>>>>>> it performs fsync(). so an write on an
>>>>>>> fd with O_SYNC becomes write+fsync. I am
>>>>>>> suspecting that when multiple threads do
>>>>>>> this write+fsync() operation on the same
>>>>>>> file, multiple writes are batched
>>>>>>> together to be written do disk so the
>>>>>>> throughput on the disk is increasing is
>>>>>>> my guess.
>>>>>>>
>>>>>>> Does it answer your doubts?
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat
>>>>>>> Haley <***@mit.edu
>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Without the oflag=sync and only a
>>>>>>> single test of each, the FUSE is
>>>>>>> going faster than NFS:
>>>>>>>
>>>>>>> FUSE:
>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>> if=/dev/zero count=4096 bs=1048576
>>>>>>> of=zeros.txt conv=sync
>>>>>>> 4096+0 records in
>>>>>>> 4096+0 records out
>>>>>>> 4294967296 bytes (4.3 GB) copied,
>>>>>>> 7.46961 s, 575 MB/s
>>>>>>>
>>>>>>>
>>>>>>> NFS
>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero
>>>>>>> count=4096 bs=1048576 of=zeros.txt
>>>>>>> conv=sync
>>>>>>> 4096+0 records in
>>>>>>> 4096+0 records out
>>>>>>> 4294967296 bytes (4.3 GB) copied,
>>>>>>> 11.4264 s, 376 MB/s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 11:53 AM, Pranith
>>>>>>> Kumar Karampuri wrote:
>>>>>>>> Could you let me know the speed
>>>>>>>> without oflag=sync on both the
>>>>>>>> mounts? No need to collect profiles.
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 9:17 PM,
>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is what I see now:
>>>>>>>>
>>>>>>>> [***@mseas-data2 ~]# gluster
>>>>>>>> volume info
>>>>>>>>
>>>>>>>> Volume Name: data-volume
>>>>>>>> Type: Distribute
>>>>>>>> Volume ID:
>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>> Status: Started
>>>>>>>> Number of Bricks: 2
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>>> Options Reconfigured:
>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>> diagnostics.latency-measurement: on
>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>> WARNING
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> nfs.disable: on
>>>>>>>> nfs.export-volumes: off
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 11:44 AM, Pranith
>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>> Is this the volume info you have?
>>>>>>>>>
>>>>>>>>> >/[root at mseas-data2
>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>>> ​I copied this from old thread
>>>>>>>>> from 2016. This is distribute
>>>>>>>>> volume. Did you change any of
>>>>>>>>> the options in between?
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email:***@mit.edu
>>>>>>>> <mailto:***@mit.edu>
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>>
>>
>> --
>> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
>
> --
> Pranith

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-31 01:27:26 UTC
Reply
Permalink
Raw Message
Pat,
What is the command you used? As per the following output, it seems
like at least one write operation took 16 seconds. Which is really bad.

96.39 1165.10 us 89.00 us *16487014.00 us* 393212
WRITE



On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> I ran the same 'dd' test both in the gluster test volume and in the
> .glusterfs directory of each brick. The median results (12 dd trials in
> each test) are similar to before
>
> - gluster test volume: 586.5 MB/s
> - bricks (in .glusterfs): 1.4 GB/s
>
> The profile for the gluster test-volume is in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/
> profile_testvol_gluster.txt
>
> Thanks
>
> Pat
>
>
>
>
> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>
> Let's start with the same 'dd' test we were testing with to see, what the
> numbers are. Please provide profile numbers for the same. From there on we
> will start tuning the volume to see what we can do.
>
> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> Thanks for the tip. We now have the gluster volume mounted under /home.
>> What tests do you recommend we run?
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Sorry for the delay. I never saw received your reply (but I did receive
>>> Ben Turner's follow-up to your reply). So we tried to create a gluster
>>> volume under /home using different variations of
>>>
>>> gluster volume create test-volume mseas-data2:/home/gbrick_test_1
>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>
>>> However we keep getting errors of the form
>>>
>>> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>>>
>>> Any thoughts on what we're doing wrong?
>>>
>>
>> You should give transport tcp at the beginning I think. Anyways,
>> transport tcp is the default, so no need to specify so remove those two
>> words from the CLI.
>>
>>>
>>> Also do you have a list of the test we should be running once we get
>>> this volume created? Given the time-zone difference it might help if we
>>> can run a small battery of tests and post the results rather than
>>> test-post-new test-post... .
>>>
>>
>> This is the first time I am doing performance analysis on users as far as
>> I remember. In our team there are separate engineers who do these tests.
>> Ben who replied earlier is one such engineer.
>>
>> Ben,
>> Have any suggestions?
>>
>>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> The /home partition is mounted as ext4
>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>
>>>> The brick partitions are mounted ax xfs
>>>> /mnt/brick1 xfs defaults 0 0
>>>> /mnt/brick2 xfs defaults 0 0
>>>>
>>>> Will this cause a problem with creating a volume under /home?
>>>>
>>>
>>> I don't think the bottleneck is disk. You can do the same tests you did
>>> on your new volume to confirm?
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Unfortunately, we don't have similar hardware for a small scale test.
>>>>> All we have is our production hardware.
>>>>>
>>>>
>>>> You said something about /home partition which has lesser disks, we can
>>>> create plain distribute volume inside one of those directories. After we
>>>> are done, we can remove the setup. What do you say?
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Since we are mounting the partitions as the bricks, I tried the dd
>>>>>> test writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not
>>>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/
>>>>>> fewer disks).
>>>>>>
>>>>>
>>>>> Okay, then 1.6Gb/s is what we need to target for, considering your
>>>>> volume is just distribute. Is there any way you can do tests on similar
>>>>> hardware but at a small scale? Just so we can run the workload to learn
>>>>> more about the bottlenecks in the system? We can probably try to get the
>>>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let
>>>>> me know if that is something you are okay to do.
>>>>>
>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>>>>>> answer by some other people who are more familiar with this.
>>>>>>>
>>>>>>> I am also uncertain about how to interpret the results when we also
>>>>>>> add the dd tests writing to the /home area (no gluster, still on the same
>>>>>>> machine)
>>>>>>>
>>>>>>> - dd test without oflag=sync (rough average of multiple tests)
>>>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>>>> - dd test with oflag=sync (rough average of multiple tests)
>>>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>>>> - nfs (no gluster): 20 Mb/s
>>>>>>>
>>>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each
>>>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect
>>>>>>> the writes to the gluster area to be roughly 8x faster than to the
>>>>>>> non-gluster.
>>>>>>>
>>>>>>
>>>>>> I think a better test is to try and write to a file using nfs without
>>>>>> any gluster to a location that is not inside the brick but someother
>>>>>> location that is on same disk(s). If you are mounting the partition as the
>>>>>> brick, then we can write to a file inside .glusterfs directory, something
>>>>>> like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>
>>>>>>
>>>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is
>>>>>>> part of the problem.
>>>>>>>
>>>>>>
>>>>>> I got interested in the post because I read that fuse speed is lesser
>>>>>> than nfs speed which is counter-intuitive to my understanding. So wanted
>>>>>> clarifications. Now that I got my clarifications where fuse outperformed
>>>>>> nfs without sync, we can resume testing as described above and try to find
>>>>>> what it is. Based on your email-id I am guessing you are from Boston and I
>>>>>> am from Bangalore so if you are okay with doing this debugging for multiple
>>>>>> days because of timezones, I will be happy to help. Please be a bit patient
>>>>>> with me, I am under a release crunch but I am very curious with the problem
>>>>>> you posted.
>>>>>>
>>>>>> Was there anything useful in the profiles?
>>>>>>>
>>>>>>
>>>>>> Unfortunately profiles didn't help me much, I think we are collecting
>>>>>> the profiles from an active volume, so it has a lot of information that is
>>>>>> not pertaining to dd so it is difficult to find the contributions of dd. So
>>>>>> I went through your post again and found something I didn't pay much
>>>>>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with
>>>>>> FUSE so sent that reply.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>>>>
>>>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in
>>>>>>> gluster NFS and fuse is a bit different.
>>>>>>> When application opens a file with O_SYNC on fuse mount then each
>>>>>>> write syscall has to be written to disk as part of the syscall where as in
>>>>>>> case of NFS, there is no concept of open. NFS performs write though a
>>>>>>> handle saying it needs to be a synchronous write, so write() syscall is
>>>>>>> performed first then it performs fsync(). so an write on an fd with O_SYNC
>>>>>>> becomes write+fsync. I am suspecting that when multiple threads do this
>>>>>>> write+fsync() operation on the same file, multiple writes are batched
>>>>>>> together to be written do disk so the throughput on the disk is increasing
>>>>>>> is my guess.
>>>>>>>
>>>>>>> Does it answer your doubts?
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>>>>>> going faster than NFS:
>>>>>>>>
>>>>>>>> FUSE:
>>>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>>> of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>> NFS
>>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>>> of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>
>>>>>>>> Could you let me know the speed without oflag=sync on both the
>>>>>>>> mounts? No need to collect profiles.
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is what I see now:
>>>>>>>>>
>>>>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>>>>
>>>>>>>>> Volume Name: data-volume
>>>>>>>>> Type: Distribute
>>>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 2
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>>>> Options Reconfigured:
>>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>>> diagnostics.latency-measurement: on
>>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>> nfs.disable: on
>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>>
>>>>>>>>> Is this the volume info you have?
>>>>>>>>>
>>>>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>>>>>> *>>* Volume Name: data-volume
>>>>>>>>> *>* Type: Distribute
>>>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> *>* Status: Started
>>>>>>>>> *>* Number of Bricks: 2
>>>>>>>>> *>* Transport-type: tcp
>>>>>>>>> *>* Bricks:
>>>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>>>>> *>* Options Reconfigured:
>>>>>>>>> *>* performance.readdir-ahead: on
>>>>>>>>> *>* nfs.disable: on
>>>>>>>>> *>* nfs.export-volumes: off
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>>
>>>>>>>>> ​I copied this from old thread from 2016. This is distribute
>>>>>>>>> volume. Did you change any of the options in between?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>
>
> --
> Pranith
>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Pat Haley
2017-05-31 01:40:34 UTC
Reply
Permalink
Raw Message
Hi Pranith,

The "dd" command was:

dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync

There were 2 instances where dd reported 22 seconds. The output from the
dd tests are in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt

Pat

On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> Pat,
> What is the command you used? As per the following output, it
> seems like at least one write operation took 16 seconds. Which is
> really bad.
> 96.39 1165.10 us 89.00 us*16487014.00 us* 393212 WRITE
>
>
> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> I ran the same 'dd' test both in the gluster test volume and in
> the .glusterfs directory of each brick. The median results (12 dd
> trials in each test) are similar to before
>
> * gluster test volume: 586.5 MB/s
> * bricks (in .glusterfs): 1.4 GB/s
>
> The profile for the gluster test-volume is in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>
> Thanks
>
> Pat
>
>
>
>
> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>> Let's start with the same 'dd' test we were testing with to see,
>> what the numbers are. Please provide profile numbers for the
>> same. From there on we will start tuning the volume to see what
>> we can do.
>>
>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Thanks for the tip. We now have the gluster volume mounted
>> under /home. What tests do you recommend we run?
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Sorry for the delay. I never saw received your reply
>>> (but I did receive Ben Turner's follow-up to your
>>> reply). So we tried to create a gluster volume under
>>> /home using different variations of
>>>
>>> gluster volume create test-volume
>>> mseas-data2:/home/gbrick_test_1
>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>
>>> However we keep getting errors of the form
>>>
>>> Wrong brick type: transport, use
>>> <HOSTNAME>:<export-dir-abs-path>
>>>
>>> Any thoughts on what we're doing wrong?
>>>
>>>
>>> You should give transport tcp at the beginning I think.
>>> Anyways, transport tcp is the default, so no need to specify
>>> so remove those two words from the CLI.
>>>
>>>
>>> Also do you have a list of the test we should be running
>>> once we get this volume created? Given the time-zone
>>> difference it might help if we can run a small battery
>>> of tests and post the results rather than test-post-new
>>> test-post... .
>>>
>>>
>>> This is the first time I am doing performance analysis on
>>> users as far as I remember. In our team there are separate
>>> engineers who do these tests. Ben who replied earlier is one
>>> such engineer.
>>>
>>> Ben,
>>> Have any suggestions?
>>>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> The /home partition is mounted as ext4
>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>
>>>> The brick partitions are mounted ax xfs
>>>> /mnt/brick1 xfs defaults 0 0
>>>> /mnt/brick2 xfs defaults 0 0
>>>>
>>>> Will this cause a problem with creating a volume
>>>> under /home?
>>>>
>>>>
>>>> I don't think the bottleneck is disk. You can do the
>>>> same tests you did on your new volume to confirm?
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Unfortunately, we don't have similar hardware
>>>>> for a small scale test. All we have is our
>>>>> production hardware.
>>>>>
>>>>>
>>>>> You said something about /home partition which has
>>>>> lesser disks, we can create plain distribute
>>>>> volume inside one of those directories. After we
>>>>> are done, we can remove the setup. What do you say?
>>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Since we are mounting the partitions as
>>>>>> the bricks, I tried the dd test writing
>>>>>> to
>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>> The results without oflag=sync were 1.6
>>>>>> Gb/s (faster than gluster but not as fast
>>>>>> as I was expecting given the 1.2 Gb/s to
>>>>>> the no-gluster area w/ fewer disks).
>>>>>>
>>>>>>
>>>>>> Okay, then 1.6Gb/s is what we need to target
>>>>>> for, considering your volume is just
>>>>>> distribute. Is there any way you can do tests
>>>>>> on similar hardware but at a small scale?
>>>>>> Just so we can run the workload to learn more
>>>>>> about the bottlenecks in the system? We can
>>>>>> probably try to get the speed to 1.2Gb/s on
>>>>>> your /home partition you were telling me
>>>>>> yesterday. Let me know if that is something
>>>>>> you are okay to do.
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat
>>>>>>> Haley <***@mit.edu
>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Not entirely sure (this isn't my
>>>>>>> area of expertise). I'll run your
>>>>>>> answer by some other people who are
>>>>>>> more familiar with this.
>>>>>>>
>>>>>>> I am also uncertain about how to
>>>>>>> interpret the results when we also
>>>>>>> add the dd tests writing to the
>>>>>>> /home area (no gluster, still on the
>>>>>>> same machine)
>>>>>>>
>>>>>>> * dd test without oflag=sync
>>>>>>> (rough average of multiple tests)
>>>>>>> o gluster w/ fuse mount : 570 Mb/s
>>>>>>> o gluster w/ nfs mount: 390 Mb/s
>>>>>>> o nfs (no gluster): 1.2 Gb/s
>>>>>>> * dd test with oflag=sync (rough
>>>>>>> average of multiple tests)
>>>>>>> o gluster w/ fuse mount: 5 Mb/s
>>>>>>> o gluster w/ nfs mount: 200 Mb/s
>>>>>>> o nfs (no gluster): 20 Mb/s
>>>>>>>
>>>>>>> Given that the non-gluster area is a
>>>>>>> RAID-6 of 4 disks while each brick
>>>>>>> of the gluster area is a RAID-6 of
>>>>>>> 32 disks, I would naively expect the
>>>>>>> writes to the gluster area to be
>>>>>>> roughly 8x faster than to the
>>>>>>> non-gluster.
>>>>>>>
>>>>>>>
>>>>>>> I think a better test is to try and
>>>>>>> write to a file using nfs without any
>>>>>>> gluster to a location that is not inside
>>>>>>> the brick but someother location that is
>>>>>>> on same disk(s). If you are mounting the
>>>>>>> partition as the brick, then we can
>>>>>>> write to a file inside .glusterfs
>>>>>>> directory, something like
>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I still think we have a speed issue,
>>>>>>> I can't tell if fuse vs nfs is part
>>>>>>> of the problem.
>>>>>>>
>>>>>>>
>>>>>>> I got interested in the post because I
>>>>>>> read that fuse speed is lesser than nfs
>>>>>>> speed which is counter-intuitive to my
>>>>>>> understanding. So wanted clarifications.
>>>>>>> Now that I got my clarifications where
>>>>>>> fuse outperformed nfs without sync, we
>>>>>>> can resume testing as described above
>>>>>>> and try to find what it is. Based on
>>>>>>> your email-id I am guessing you are from
>>>>>>> Boston and I am from Bangalore so if you
>>>>>>> are okay with doing this debugging for
>>>>>>> multiple days because of timezones, I
>>>>>>> will be happy to help. Please be a bit
>>>>>>> patient with me, I am under a release
>>>>>>> crunch but I am very curious with the
>>>>>>> problem you posted.
>>>>>>>
>>>>>>> Was there anything useful in the
>>>>>>> profiles?
>>>>>>>
>>>>>>>
>>>>>>> Unfortunately profiles didn't help me
>>>>>>> much, I think we are collecting the
>>>>>>> profiles from an active volume, so it
>>>>>>> has a lot of information that is not
>>>>>>> pertaining to dd so it is difficult to
>>>>>>> find the contributions of dd. So I went
>>>>>>> through your post again and found
>>>>>>> something I didn't pay much attention to
>>>>>>> earlier i.e. oflag=sync, so did my own
>>>>>>> tests on my setup with FUSE so sent that
>>>>>>> reply.
>>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>> Kumar Karampuri wrote:
>>>>>>>> Okay good. At least this validates
>>>>>>>> my doubts. Handling O_SYNC in
>>>>>>>> gluster NFS and fuse is a bit
>>>>>>>> different.
>>>>>>>> When application opens a file with
>>>>>>>> O_SYNC on fuse mount then each
>>>>>>>> write syscall has to be written to
>>>>>>>> disk as part of the syscall where
>>>>>>>> as in case of NFS, there is no
>>>>>>>> concept of open. NFS performs write
>>>>>>>> though a handle saying it needs to
>>>>>>>> be a synchronous write, so write()
>>>>>>>> syscall is performed first then it
>>>>>>>> performs fsync(). so an write on an
>>>>>>>> fd with O_SYNC becomes write+fsync.
>>>>>>>> I am suspecting that when multiple
>>>>>>>> threads do this write+fsync()
>>>>>>>> operation on the same file,
>>>>>>>> multiple writes are batched
>>>>>>>> together to be written do disk so
>>>>>>>> the throughput on the disk is
>>>>>>>> increasing is my guess.
>>>>>>>>
>>>>>>>> Does it answer your doubts?
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 9:35 PM,
>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Without the oflag=sync and only
>>>>>>>> a single test of each, the FUSE
>>>>>>>> is going faster than NFS:
>>>>>>>>
>>>>>>>> FUSE:
>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>> if=/dev/zero count=4096
>>>>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>> NFS
>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>> if=/dev/zero count=4096
>>>>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 11:53 AM, Pranith
>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>> Could you let me know the
>>>>>>>>> speed without oflag=sync on
>>>>>>>>> both the mounts? No need to
>>>>>>>>> collect profiles.
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 9:17
>>>>>>>>> PM, Pat Haley <***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is what I see now:
>>>>>>>>>
>>>>>>>>> [***@mseas-data2 ~]#
>>>>>>>>> gluster volume info
>>>>>>>>>
>>>>>>>>> Volume Name: data-volume
>>>>>>>>> Type: Distribute
>>>>>>>>> Volume ID:
>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 2
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1:
>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>> Brick2:
>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>> Options Reconfigured:
>>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>> on
>>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>> WARNING
>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>> nfs.disable: on
>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 11:44 AM,
>>>>>>>>> Pranith Kumar Karampuri wrote:
>>>>>>>>>> Is this the volume info
>>>>>>>>>> you have?
>>>>>>>>>>
>>>>>>>>>> >/[root at mseas-data2
>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>> />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1:
>>>>>>>>>> mseas-data2:/mnt/brick1 />/Brick2:
>>>>>>>>>> mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead:
>>>>>>>>>> on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>>>> ​I copied this from old
>>>>>>>>>> thread from 2016. This is
>>>>>>>>>> distribute volume. Did
>>>>>>>>>> you change any of the
>>>>>>>>>> options in between?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email:***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email:***@mit.edu
>>>>>>>> <mailto:***@mit.edu>
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>>
>>
>> --
>> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
>
> --
> Pranith

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pranith Kumar Karampuri
2017-05-31 01:54:34 UTC
Reply
Permalink
Raw Message
Thanks this is good information.

+Soumya

Soumya,
We are trying to find why kNFS is performing way better than plain
distribute glusterfs+fuse. What information do you think will benefit us to
compare the operations with kNFS vs gluster+fuse? We already have profile
output from fuse.


On Wed, May 31, 2017 at 7:10 AM, Pat Haley <***@mit.edu> wrote:

>
> Hi Pranith,
>
> The "dd" command was:
>
> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>
> There were 2 instances where dd reported 22 seconds. The output from the
> dd tests are in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/
> dd_testvol_gluster.txt
>
> Pat
>
>
> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>
> Pat,
> What is the command you used? As per the following output, it seems
> like at least one write operation took 16 seconds. Which is really bad.
>
> 96.39 1165.10 us 89.00 us *16487014.00 us* 393212 WRITE
>
>
>
> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> I ran the same 'dd' test both in the gluster test volume and in the
>> .glusterfs directory of each brick. The median results (12 dd trials in
>> each test) are similar to before
>>
>> - gluster test volume: 586.5 MB/s
>> - bricks (in .glusterfs): 1.4 GB/s
>>
>> The profile for the gluster test-volume is in
>>
>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/pr
>> ofile_testvol_gluster.txt
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>>
>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>
>> Let's start with the same 'dd' test we were testing with to see, what the
>> numbers are. Please provide profile numbers for the same. From there on we
>> will start tuning the volume to see what we can do.
>>
>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu> wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Thanks for the tip. We now have the gluster volume mounted under
>>> /home. What tests do you recommend we run?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Sorry for the delay. I never saw received your reply (but I did
>>>> receive Ben Turner's follow-up to your reply). So we tried to create a
>>>> gluster volume under /home using different variations of
>>>>
>>>> gluster volume create test-volume mseas-data2:/home/gbrick_test_1
>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>
>>>> However we keep getting errors of the form
>>>>
>>>> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>>>>
>>>> Any thoughts on what we're doing wrong?
>>>>
>>>
>>> You should give transport tcp at the beginning I think. Anyways,
>>> transport tcp is the default, so no need to specify so remove those two
>>> words from the CLI.
>>>
>>>>
>>>> Also do you have a list of the test we should be running once we get
>>>> this volume created? Given the time-zone difference it might help if we
>>>> can run a small battery of tests and post the results rather than
>>>> test-post-new test-post... .
>>>>
>>>
>>> This is the first time I am doing performance analysis on users as far
>>> as I remember. In our team there are separate engineers who do these tests.
>>> Ben who replied earlier is one such engineer.
>>>
>>> Ben,
>>> Have any suggestions?
>>>
>>>
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <***@mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> The /home partition is mounted as ext4
>>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>>
>>>>> The brick partitions are mounted ax xfs
>>>>> /mnt/brick1 xfs defaults 0 0
>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>
>>>>> Will this cause a problem with creating a volume under /home?
>>>>>
>>>>
>>>> I don't think the bottleneck is disk. You can do the same tests you did
>>>> on your new volume to confirm?
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <***@mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Unfortunately, we don't have similar hardware for a small scale
>>>>>> test. All we have is our production hardware.
>>>>>>
>>>>>
>>>>> You said something about /home partition which has lesser disks, we
>>>>> can create plain distribute volume inside one of those directories. After
>>>>> we are done, we can remove the setup. What do you say?
>>>>>
>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <***@mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Since we are mounting the partitions as the bricks, I tried the dd
>>>>>>> test writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not
>>>>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/
>>>>>>> fewer disks).
>>>>>>>
>>>>>>
>>>>>> Okay, then 1.6Gb/s is what we need to target for, considering your
>>>>>> volume is just distribute. Is there any way you can do tests on similar
>>>>>> hardware but at a small scale? Just so we can run the workload to learn
>>>>>> more about the bottlenecks in the system? We can probably try to get the
>>>>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let
>>>>>> me know if that is something you are okay to do.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> Not entirely sure (this isn't my area of expertise). I'll run your
>>>>>>>> answer by some other people who are more familiar with this.
>>>>>>>>
>>>>>>>> I am also uncertain about how to interpret the results when we also
>>>>>>>> add the dd tests writing to the /home area (no gluster, still on the same
>>>>>>>> machine)
>>>>>>>>
>>>>>>>> - dd test without oflag=sync (rough average of multiple tests)
>>>>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>>>>> - dd test with oflag=sync (rough average of multiple tests)
>>>>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>>>>> - nfs (no gluster): 20 Mb/s
>>>>>>>>
>>>>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each
>>>>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect
>>>>>>>> the writes to the gluster area to be roughly 8x faster than to the
>>>>>>>> non-gluster.
>>>>>>>>
>>>>>>>
>>>>>>> I think a better test is to try and write to a file using nfs
>>>>>>> without any gluster to a location that is not inside the brick but
>>>>>>> someother location that is on same disk(s). If you are mounting the
>>>>>>> partition as the brick, then we can write to a file inside .glusterfs
>>>>>>> directory, something like <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is
>>>>>>>> part of the problem.
>>>>>>>>
>>>>>>>
>>>>>>> I got interested in the post because I read that fuse speed is
>>>>>>> lesser than nfs speed which is counter-intuitive to my understanding. So
>>>>>>> wanted clarifications. Now that I got my clarifications where fuse
>>>>>>> outperformed nfs without sync, we can resume testing as described above and
>>>>>>> try to find what it is. Based on your email-id I am guessing you are from
>>>>>>> Boston and I am from Bangalore so if you are okay with doing this debugging
>>>>>>> for multiple days because of timezones, I will be happy to help. Please be
>>>>>>> a bit patient with me, I am under a release crunch but I am very curious
>>>>>>> with the problem you posted.
>>>>>>>
>>>>>>> Was there anything useful in the profiles?
>>>>>>>>
>>>>>>>
>>>>>>> Unfortunately profiles didn't help me much, I think we are
>>>>>>> collecting the profiles from an active volume, so it has a lot of
>>>>>>> information that is not pertaining to dd so it is difficult to find the
>>>>>>> contributions of dd. So I went through your post again and found something
>>>>>>> I didn't pay much attention to earlier i.e. oflag=sync, so did my own tests
>>>>>>> on my setup with FUSE so sent that reply.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>>>>>
>>>>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in
>>>>>>>> gluster NFS and fuse is a bit different.
>>>>>>>> When application opens a file with O_SYNC on fuse mount then each
>>>>>>>> write syscall has to be written to disk as part of the syscall where as in
>>>>>>>> case of NFS, there is no concept of open. NFS performs write though a
>>>>>>>> handle saying it needs to be a synchronous write, so write() syscall is
>>>>>>>> performed first then it performs fsync(). so an write on an fd with O_SYNC
>>>>>>>> becomes write+fsync. I am suspecting that when multiple threads do this
>>>>>>>> write+fsync() operation on the same file, multiple writes are batched
>>>>>>>> together to be written do disk so the throughput on the disk is increasing
>>>>>>>> is my guess.
>>>>>>>>
>>>>>>>> Does it answer your doubts?
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Without the oflag=sync and only a single test of each, the FUSE is
>>>>>>>>> going faster than NFS:
>>>>>>>>>
>>>>>>>>> FUSE:
>>>>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>>>> of=zeros.txt conv=sync
>>>>>>>>> 4096+0 records in
>>>>>>>>> 4096+0 records out
>>>>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> NFS
>>>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576
>>>>>>>>> of=zeros.txt conv=sync
>>>>>>>>> 4096+0 records in
>>>>>>>>> 4096+0 records out
>>>>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>>
>>>>>>>>> Could you let me know the speed without oflag=sync on both the
>>>>>>>>> mounts? No need to collect profiles.
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <***@mit.edu> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is what I see now:
>>>>>>>>>>
>>>>>>>>>> [***@mseas-data2 ~]# gluster volume info
>>>>>>>>>>
>>>>>>>>>> Volume Name: data-volume
>>>>>>>>>> Type: Distribute
>>>>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>> Status: Started
>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>> Transport-type: tcp
>>>>>>>>>> Bricks:
>>>>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>>>>> Options Reconfigured:
>>>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>>>> diagnostics.latency-measurement: on
>>>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>> nfs.disable: on
>>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>>>
>>>>>>>>>> Is this the volume info you have?
>>>>>>>>>>
>>>>>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info
>>>>>>>>>> *>>* Volume Name: data-volume
>>>>>>>>>> *>* Type: Distribute
>>>>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>> *>* Status: Started
>>>>>>>>>> *>* Number of Bricks: 2
>>>>>>>>>> *>* Transport-type: tcp
>>>>>>>>>> *>* Bricks:
>>>>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>>>>>> *>* Options Reconfigured:
>>>>>>>>>> *>* performance.readdir-ahead: on
>>>>>>>>>> *>* nfs.disable: on
>>>>>>>>>> *>* nfs.export-volumes: off
>>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>> ​I copied this from old thread from 2016. This is distribute
>>>>>>>>>> volume. Did you change any of the options in between?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>
>
> --
> Pranith
>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>


--
Pranith
Soumya Koduri
2017-05-31 10:56:26 UTC
Reply
Permalink
Raw Message
On 05/31/2017 07:24 AM, Pranith Kumar Karampuri wrote:
> Thanks this is good information.
>
> +Soumya
>
> Soumya,
> We are trying to find why kNFS is performing way better than
> plain distribute glusterfs+fuse. What information do you think will
> benefit us to compare the operations with kNFS vs gluster+fuse? We
> already have profile output from fuse.
>
Could be because all operations done by kNFS are local to the system.
The operations done by FUSE mount over network could be more in number
and time-consuming than the ones sent by NFS-client. We could compare
and examine the pattern from tcpump taken over fuse-mount and NFS-mount.
Also nfsstat [1] may give some clue.

Sorry I hadn't followed this mail from the beginning. But is this
comparison between single brick volume and kNFS exporting that brick?
Otherwise its not a fair comparison if the volume is replicated or
distributed.

Thanks,
Soumya

[1] https://linux.die.net/man/8/nfsstat

>
> On Wed, May 31, 2017 at 7:10 AM, Pat Haley <***@mit.edu
> <mailto:***@mit.edu>> wrote:
>
>
> Hi Pranith,
>
> The "dd" command was:
>
> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>
> There were 2 instances where dd reported 22 seconds. The output from
> the dd tests are in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt>
>
> Pat
>
>
> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>> Pat,
>> What is the command you used? As per the following output,
>> it seems like at least one write operation took 16 seconds. Which
>> is really bad.
>> 96.39 1165.10 us 89.00 us *16487014.00 us* 393212 WRITE
>>
>>
>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> I ran the same 'dd' test both in the gluster test volume and
>> in the .glusterfs directory of each brick. The median results
>> (12 dd trials in each test) are similar to before
>>
>> * gluster test volume: 586.5 MB/s
>> * bricks (in .glusterfs): 1.4 GB/s
>>
>> The profile for the gluster test-volume is in
>>
>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>>
>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>> Let's start with the same 'dd' test we were testing with to
>>> see, what the numbers are. Please provide profile numbers for
>>> the same. From there on we will start tuning the volume to
>>> see what we can do.
>>>
>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Thanks for the tip. We now have the gluster volume
>>> mounted under /home. What tests do you recommend we run?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley
>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Sorry for the delay. I never saw received your
>>>> reply (but I did receive Ben Turner's follow-up to
>>>> your reply). So we tried to create a gluster volume
>>>> under /home using different variations of
>>>>
>>>> gluster volume create test-volume
>>>> mseas-data2:/home/gbrick_test_1
>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>
>>>> However we keep getting errors of the form
>>>>
>>>> Wrong brick type: transport, use
>>>> <HOSTNAME>:<export-dir-abs-path>
>>>>
>>>> Any thoughts on what we're doing wrong?
>>>>
>>>>
>>>> You should give transport tcp at the beginning I think.
>>>> Anyways, transport tcp is the default, so no need to
>>>> specify so remove those two words from the CLI.
>>>>
>>>>
>>>> Also do you have a list of the test we should be
>>>> running once we get this volume created? Given the
>>>> time-zone difference it might help if we can run a
>>>> small battery of tests and post the results rather
>>>> than test-post-new test-post... .
>>>>
>>>>
>>>> This is the first time I am doing performance analysis
>>>> on users as far as I remember. In our team there are
>>>> separate engineers who do these tests. Ben who replied
>>>> earlier is one such engineer.
>>>>
>>>> Ben,
>>>> Have any suggestions?
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> The /home partition is mounted as ext4
>>>>> /home ext4
>>>>> defaults,usrquota,grpquota 1 2
>>>>>
>>>>> The brick partitions are mounted ax xfs
>>>>> /mnt/brick1 xfs defaults 0 0
>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>
>>>>> Will this cause a problem with creating a
>>>>> volume under /home?
>>>>>
>>>>>
>>>>> I don't think the bottleneck is disk. You can do
>>>>> the same tests you did on your new volume to confirm?
>>>>>
>>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Unfortunately, we don't have similar
>>>>>> hardware for a small scale test. All we
>>>>>> have is our production hardware.
>>>>>>
>>>>>>
>>>>>> You said something about /home partition which
>>>>>> has lesser disks, we can create plain
>>>>>> distribute volume inside one of those
>>>>>> directories. After we are done, we can remove
>>>>>> the setup. What do you say?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat
>>>>>>> Haley <***@mit.edu
>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Since we are mounting the partitions
>>>>>>> as the bricks, I tried the dd test
>>>>>>> writing to
>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>> The results without oflag=sync were
>>>>>>> 1.6 Gb/s (faster than gluster but not
>>>>>>> as fast as I was expecting given the
>>>>>>> 1.2 Gb/s to the no-gluster area w/
>>>>>>> fewer disks).
>>>>>>>
>>>>>>>
>>>>>>> Okay, then 1.6Gb/s is what we need to
>>>>>>> target for, considering your volume is
>>>>>>> just distribute. Is there any way you can
>>>>>>> do tests on similar hardware but at a
>>>>>>> small scale? Just so we can run the
>>>>>>> workload to learn more about the
>>>>>>> bottlenecks in the system? We can
>>>>>>> probably try to get the speed to 1.2Gb/s
>>>>>>> on your /home partition you were telling
>>>>>>> me yesterday. Let me know if that is
>>>>>>> something you are okay to do.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>>> Karampuri wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 10:15 PM,
>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> Not entirely sure (this isn't my
>>>>>>>> area of expertise). I'll run
>>>>>>>> your answer by some other people
>>>>>>>> who are more familiar with this.
>>>>>>>>
>>>>>>>> I am also uncertain about how to
>>>>>>>> interpret the results when we
>>>>>>>> also add the dd tests writing to
>>>>>>>> the /home area (no gluster,
>>>>>>>> still on the same machine)
>>>>>>>>
>>>>>>>> * dd test without oflag=sync
>>>>>>>> (rough average of multiple
>>>>>>>> tests)
>>>>>>>> o gluster w/ fuse mount :
>>>>>>>> 570 Mb/s
>>>>>>>> o gluster w/ nfs mount:
>>>>>>>> 390 Mb/s
>>>>>>>> o nfs (no gluster): 1.2 Gb/s
>>>>>>>> * dd test with oflag=sync
>>>>>>>> (rough average of multiple
>>>>>>>> tests)
>>>>>>>> o gluster w/ fuse mount:
>>>>>>>> 5 Mb/s
>>>>>>>> o gluster w/ nfs mount:
>>>>>>>> 200 Mb/s
>>>>>>>> o nfs (no gluster): 20 Mb/s
>>>>>>>>
>>>>>>>> Given that the non-gluster area
>>>>>>>> is a RAID-6 of 4 disks while
>>>>>>>> each brick of the gluster area
>>>>>>>> is a RAID-6 of 32 disks, I would
>>>>>>>> naively expect the writes to the
>>>>>>>> gluster area to be roughly 8x
>>>>>>>> faster than to the non-gluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> I think a better test is to try and
>>>>>>>> write to a file using nfs without
>>>>>>>> any gluster to a location that is
>>>>>>>> not inside the brick but someother
>>>>>>>> location that is on same disk(s). If
>>>>>>>> you are mounting the partition as
>>>>>>>> the brick, then we can write to a
>>>>>>>> file inside .glusterfs directory,
>>>>>>>> something like
>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I still think we have a speed
>>>>>>>> issue, I can't tell if fuse vs
>>>>>>>> nfs is part of the problem.
>>>>>>>>
>>>>>>>>
>>>>>>>> I got interested in the post because
>>>>>>>> I read that fuse speed is lesser
>>>>>>>> than nfs speed which is
>>>>>>>> counter-intuitive to my
>>>>>>>> understanding. So wanted
>>>>>>>> clarifications. Now that I got my
>>>>>>>> clarifications where fuse
>>>>>>>> outperformed nfs without sync, we
>>>>>>>> can resume testing as described
>>>>>>>> above and try to find what it is.
>>>>>>>> Based on your email-id I am guessing
>>>>>>>> you are from Boston and I am from
>>>>>>>> Bangalore so if you are okay with
>>>>>>>> doing this debugging for multiple
>>>>>>>> days because of timezones, I will be
>>>>>>>> happy to help. Please be a bit
>>>>>>>> patient with me, I am under a
>>>>>>>> release crunch but I am very curious
>>>>>>>> with the problem you posted.
>>>>>>>>
>>>>>>>> Was there anything useful in
>>>>>>>> the profiles?
>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately profiles didn't help
>>>>>>>> me much, I think we are collecting
>>>>>>>> the profiles from an active volume,
>>>>>>>> so it has a lot of information that
>>>>>>>> is not pertaining to dd so it is
>>>>>>>> difficult to find the contributions
>>>>>>>> of dd. So I went through your post
>>>>>>>> again and found something I didn't
>>>>>>>> pay much attention to earlier i.e.
>>>>>>>> oflag=sync, so did my own tests on
>>>>>>>> my setup with FUSE so sent that reply.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>> Okay good. At least this
>>>>>>>>> validates my doubts. Handling
>>>>>>>>> O_SYNC in gluster NFS and fuse
>>>>>>>>> is a bit different.
>>>>>>>>> When application opens a file
>>>>>>>>> with O_SYNC on fuse mount then
>>>>>>>>> each write syscall has to be
>>>>>>>>> written to disk as part of the
>>>>>>>>> syscall where as in case of
>>>>>>>>> NFS, there is no concept of
>>>>>>>>> open. NFS performs write though
>>>>>>>>> a handle saying it needs to be
>>>>>>>>> a synchronous write, so write()
>>>>>>>>> syscall is performed first then
>>>>>>>>> it performs fsync(). so an
>>>>>>>>> write on an fd with O_SYNC
>>>>>>>>> becomes write+fsync. I am
>>>>>>>>> suspecting that when multiple
>>>>>>>>> threads do this write+fsync()
>>>>>>>>> operation on the same file,
>>>>>>>>> multiple writes are batched
>>>>>>>>> together to be written do disk
>>>>>>>>> so the throughput on the disk
>>>>>>>>> is increasing is my guess.
>>>>>>>>>
>>>>>>>>> Does it answer your doubts?
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 9:35
>>>>>>>>> PM, Pat Haley <***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Without the oflag=sync and
>>>>>>>>> only a single test of each,
>>>>>>>>> the FUSE is going faster
>>>>>>>>> than NFS:
>>>>>>>>>
>>>>>>>>> FUSE:
>>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>> conv=sync
>>>>>>>>> 4096+0 records in
>>>>>>>>> 4096+0 records out
>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> NFS
>>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>> conv=sync
>>>>>>>>> 4096+0 records in
>>>>>>>>> 4096+0 records out
>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 11:53 AM,
>>>>>>>>> Pranith Kumar Karampuri wrote:
>>>>>>>>>> Could you let me know the
>>>>>>>>>> speed without oflag=sync
>>>>>>>>>> on both the mounts? No
>>>>>>>>>> need to collect profiles.
>>>>>>>>>>
>>>>>>>>>> On Wed, May 10, 2017 at
>>>>>>>>>> 9:17 PM, Pat Haley
>>>>>>>>>> <***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is what I see now:
>>>>>>>>>>
>>>>>>>>>> [***@mseas-data2 ~]#
>>>>>>>>>> gluster volume info
>>>>>>>>>>
>>>>>>>>>> Volume Name: data-volume
>>>>>>>>>> Type: Distribute
>>>>>>>>>> Volume ID:
>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>> Status: Started
>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>> Transport-type: tcp
>>>>>>>>>> Bricks:
>>>>>>>>>> Brick1:
>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>> Brick2:
>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>> Options Reconfigured:
>>>>>>>>>> diagnostics.count-fop-hits:
>>>>>>>>>> on
>>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>>> on
>>>>>>>>>> nfs.exports-auth-enable:
>>>>>>>>>> on
>>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>>> WARNING
>>>>>>>>>> performance.readdir-ahead:
>>>>>>>>>> on
>>>>>>>>>> nfs.disable: on
>>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/10/2017 11:44
>>>>>>>>>> AM, Pranith Kumar
>>>>>>>>>> Karampuri wrote:
>>>>>>>>>>> Is this the volume
>>>>>>>>>>> info you have?
>>>>>>>>>>>
>>>>>>>>>>> >/[root at mseas-data2
>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>> ~]# gluster volume info />//>/Volume Name:
>>>>>>>>>>> data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>> />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1:
>>>>>>>>>>> mseas-data2:/mnt/brick1 />/Brick2:
>>>>>>>>>>> mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead:
>>>>>>>>>>> on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>>>>> ​I copied this from
>>>>>>>>>>> old thread from 2016.
>>>>>>>>>>> This is distribute
>>>>>>>>>>> volume. Did you
>>>>>>>>>>> change any of the
>>>>>>>>>>> options in between?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pranith
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>> <mailto:***@mit.edu>
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: ***@mit.edu <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: ***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>>
>>
>> --
>> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu <mailto:***@mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
>
> --
> Pranith
Pat Haley
2017-05-31 14:03:32 UTC
Reply
Permalink
Raw Message
Hi Soumya,

For the latest test we set up a test gluster volume consisting of 2
bricks both residing on an NFS disk (/home). The gluster volume is
neither replicated nor striped. The tests were performed on the server
hosting the disk, so no network was involved.

Addition details of the system are in
http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html
(note that here the tests are now all being done under the /home disk)

Pat


On 05/31/2017 06:56 AM, Soumya Koduri wrote:
>
>
> On 05/31/2017 07:24 AM, Pranith Kumar Karampuri wrote:
>> Thanks this is good information.
>>
>> +Soumya
>>
>> Soumya,
>> We are trying to find why kNFS is performing way better than
>> plain distribute glusterfs+fuse. What information do you think will
>> benefit us to compare the operations with kNFS vs gluster+fuse? We
>> already have profile output from fuse.
>>
> Could be because all operations done by kNFS are local to the system.
> The operations done by FUSE mount over network could be more in number
> and time-consuming than the ones sent by NFS-client. We could compare
> and examine the pattern from tcpump taken over fuse-mount and
> NFS-mount. Also nfsstat [1] may give some clue.
>
> Sorry I hadn't followed this mail from the beginning. But is this
> comparison between single brick volume and kNFS exporting that brick?
> Otherwise its not a fair comparison if the volume is replicated or
> distributed.
>
> Thanks,
> Soumya
>
> [1] https://linux.die.net/man/8/nfsstat
>
>>
>> On Wed, May 31, 2017 at 7:10 AM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> The "dd" command was:
>>
>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>>
>> There were 2 instances where dd reported 22 seconds. The output from
>> the dd tests are in
>>
>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt>
>>
>> Pat
>>
>>
>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>>> Pat,
>>> What is the command you used? As per the following output,
>>> it seems like at least one write operation took 16 seconds. Which
>>> is really bad.
>>> 96.39 1165.10 us 89.00 us *16487014.00 us*
>>> 393212 WRITE
>>>
>>>
>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> I ran the same 'dd' test both in the gluster test volume and
>>> in the .glusterfs directory of each brick. The median results
>>> (12 dd trials in each test) are similar to before
>>>
>>> * gluster test volume: 586.5 MB/s
>>> * bricks (in .glusterfs): 1.4 GB/s
>>>
>>> The profile for the gluster test-volume is in
>>>
>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>>> Let's start with the same 'dd' test we were testing with to
>>>> see, what the numbers are. Please provide profile numbers for
>>>> the same. From there on we will start tuning the volume to
>>>> see what we can do.
>>>>
>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
>>>> <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Thanks for the tip. We now have the gluster volume
>>>> mounted under /home. What tests do you recommend we run?
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Sorry for the delay. I never saw received your
>>>>> reply (but I did receive Ben Turner's follow-up to
>>>>> your reply). So we tried to create a gluster volume
>>>>> under /home using different variations of
>>>>>
>>>>> gluster volume create test-volume
>>>>> mseas-data2:/home/gbrick_test_1
>>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>>
>>>>> However we keep getting errors of the form
>>>>>
>>>>> Wrong brick type: transport, use
>>>>> <HOSTNAME>:<export-dir-abs-path>
>>>>>
>>>>> Any thoughts on what we're doing wrong?
>>>>>
>>>>>
>>>>> You should give transport tcp at the beginning I think.
>>>>> Anyways, transport tcp is the default, so no need to
>>>>> specify so remove those two words from the CLI.
>>>>>
>>>>>
>>>>> Also do you have a list of the test we should be
>>>>> running once we get this volume created? Given the
>>>>> time-zone difference it might help if we can run a
>>>>> small battery of tests and post the results rather
>>>>> than test-post-new test-post... .
>>>>>
>>>>>
>>>>> This is the first time I am doing performance analysis
>>>>> on users as far as I remember. In our team there are
>>>>> separate engineers who do these tests. Ben who replied
>>>>> earlier is one such engineer.
>>>>>
>>>>> Ben,
>>>>> Have any suggestions?
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> The /home partition is mounted as ext4
>>>>>> /home ext4
>>>>>> defaults,usrquota,grpquota 1 2
>>>>>>
>>>>>> The brick partitions are mounted ax xfs
>>>>>> /mnt/brick1 xfs defaults 0 0
>>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>>
>>>>>> Will this cause a problem with creating a
>>>>>> volume under /home?
>>>>>>
>>>>>>
>>>>>> I don't think the bottleneck is disk. You can do
>>>>>> the same tests you did on your new volume to
>>>>>> confirm?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Unfortunately, we don't have similar
>>>>>>> hardware for a small scale test. All we
>>>>>>> have is our production hardware.
>>>>>>>
>>>>>>>
>>>>>>> You said something about /home partition which
>>>>>>> has lesser disks, we can create plain
>>>>>>> distribute volume inside one of those
>>>>>>> directories. After we are done, we can remove
>>>>>>> the setup. What do you say?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>>>> Karampuri wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat
>>>>>>>> Haley <***@mit.edu
>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> Since we are mounting the partitions
>>>>>>>> as the bricks, I tried the dd test
>>>>>>>> writing to
>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>> The results without oflag=sync were
>>>>>>>> 1.6 Gb/s (faster than gluster but not
>>>>>>>> as fast as I was expecting given the
>>>>>>>> 1.2 Gb/s to the no-gluster area w/
>>>>>>>> fewer disks).
>>>>>>>>
>>>>>>>>
>>>>>>>> Okay, then 1.6Gb/s is what we need to
>>>>>>>> target for, considering your volume is
>>>>>>>> just distribute. Is there any way you can
>>>>>>>> do tests on similar hardware but at a
>>>>>>>> small scale? Just so we can run the
>>>>>>>> workload to learn more about the
>>>>>>>> bottlenecks in the system? We can
>>>>>>>> probably try to get the speed to 1.2Gb/s
>>>>>>>> on your /home partition you were telling
>>>>>>>> me yesterday. Let me know if that is
>>>>>>>> something you are okay to do.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>>>> Karampuri wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM,
>>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Pranith,
>>>>>>>>>
>>>>>>>>> Not entirely sure (this isn't my
>>>>>>>>> area of expertise). I'll run
>>>>>>>>> your answer by some other people
>>>>>>>>> who are more familiar with this.
>>>>>>>>>
>>>>>>>>> I am also uncertain about how to
>>>>>>>>> interpret the results when we
>>>>>>>>> also add the dd tests writing to
>>>>>>>>> the /home area (no gluster,
>>>>>>>>> still on the same machine)
>>>>>>>>>
>>>>>>>>> * dd test without oflag=sync
>>>>>>>>> (rough average of multiple
>>>>>>>>> tests)
>>>>>>>>> o gluster w/ fuse mount :
>>>>>>>>> 570 Mb/s
>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>> 390 Mb/s
>>>>>>>>> o nfs (no gluster): 1.2
>>>>>>>>> Gb/s
>>>>>>>>> * dd test with oflag=sync
>>>>>>>>> (rough average of multiple
>>>>>>>>> tests)
>>>>>>>>> o gluster w/ fuse mount:
>>>>>>>>> 5 Mb/s
>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>> 200 Mb/s
>>>>>>>>> o nfs (no gluster): 20 Mb/s
>>>>>>>>>
>>>>>>>>> Given that the non-gluster area
>>>>>>>>> is a RAID-6 of 4 disks while
>>>>>>>>> each brick of the gluster area
>>>>>>>>> is a RAID-6 of 32 disks, I would
>>>>>>>>> naively expect the writes to the
>>>>>>>>> gluster area to be roughly 8x
>>>>>>>>> faster than to the non-gluster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think a better test is to try and
>>>>>>>>> write to a file using nfs without
>>>>>>>>> any gluster to a location that is
>>>>>>>>> not inside the brick but someother
>>>>>>>>> location that is on same disk(s). If
>>>>>>>>> you are mounting the partition as
>>>>>>>>> the brick, then we can write to a
>>>>>>>>> file inside .glusterfs directory,
>>>>>>>>> something like
>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I still think we have a speed
>>>>>>>>> issue, I can't tell if fuse vs
>>>>>>>>> nfs is part of the problem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I got interested in the post because
>>>>>>>>> I read that fuse speed is lesser
>>>>>>>>> than nfs speed which is
>>>>>>>>> counter-intuitive to my
>>>>>>>>> understanding. So wanted
>>>>>>>>> clarifications. Now that I got my
>>>>>>>>> clarifications where fuse
>>>>>>>>> outperformed nfs without sync, we
>>>>>>>>> can resume testing as described
>>>>>>>>> above and try to find what it is.
>>>>>>>>> Based on your email-id I am guessing
>>>>>>>>> you are from Boston and I am from
>>>>>>>>> Bangalore so if you are okay with
>>>>>>>>> doing this debugging for multiple
>>>>>>>>> days because of timezones, I will be
>>>>>>>>> happy to help. Please be a bit
>>>>>>>>> patient with me, I am under a
>>>>>>>>> release crunch but I am very curious
>>>>>>>>> with the problem you posted.
>>>>>>>>>
>>>>>>>>> Was there anything useful in
>>>>>>>>> the profiles?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unfortunately profiles didn't help
>>>>>>>>> me much, I think we are collecting
>>>>>>>>> the profiles from an active volume,
>>>>>>>>> so it has a lot of information that
>>>>>>>>> is not pertaining to dd so it is
>>>>>>>>> difficult to find the contributions
>>>>>>>>> of dd. So I went through your post
>>>>>>>>> again and found something I didn't
>>>>>>>>> pay much attention to earlier i.e.
>>>>>>>>> oflag=sync, so did my own tests on
>>>>>>>>> my setup with FUSE so sent that
>>>>>>>>> reply.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Pat
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>> Okay good. At least this
>>>>>>>>>> validates my doubts. Handling
>>>>>>>>>> O_SYNC in gluster NFS and fuse
>>>>>>>>>> is a bit different.
>>>>>>>>>> When application opens a file
>>>>>>>>>> with O_SYNC on fuse mount then
>>>>>>>>>> each write syscall has to be
>>>>>>>>>> written to disk as part of the
>>>>>>>>>> syscall where as in case of
>>>>>>>>>> NFS, there is no concept of
>>>>>>>>>> open. NFS performs write though
>>>>>>>>>> a handle saying it needs to be
>>>>>>>>>> a synchronous write, so write()
>>>>>>>>>> syscall is performed first then
>>>>>>>>>> it performs fsync(). so an
>>>>>>>>>> write on an fd with O_SYNC
>>>>>>>>>> becomes write+fsync. I am
>>>>>>>>>> suspecting that when multiple
>>>>>>>>>> threads do this write+fsync()
>>>>>>>>>> operation on the same file,
>>>>>>>>>> multiple writes are batched
>>>>>>>>>> together to be written do disk
>>>>>>>>>> so the throughput on the disk
>>>>>>>>>> is increasing is my guess.
>>>>>>>>>>
>>>>>>>>>> Does it answer your doubts?
>>>>>>>>>>
>>>>>>>>>> On Wed, May 10, 2017 at 9:35
>>>>>>>>>> PM, Pat Haley <***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Without the oflag=sync and
>>>>>>>>>> only a single test of each,
>>>>>>>>>> the FUSE is going faster
>>>>>>>>>> than NFS:
>>>>>>>>>>
>>>>>>>>>> FUSE:
>>>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>> conv=sync
>>>>>>>>>> 4096+0 records in
>>>>>>>>>> 4096+0 records out
>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> NFS
>>>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>> conv=sync
>>>>>>>>>> 4096+0 records in
>>>>>>>>>> 4096+0 records out
>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/10/2017 11:53 AM,
>>>>>>>>>> Pranith Kumar Karampuri
>>>>>>>>>> wrote:
>>>>>>>>>>> Could you let me know the
>>>>>>>>>>> speed without oflag=sync
>>>>>>>>>>> on both the mounts? No
>>>>>>>>>>> need to collect profiles.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 10, 2017 at
>>>>>>>>>>> 9:17 PM, Pat Haley
>>>>>>>>>>> <***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is what I see now:
>>>>>>>>>>>
>>>>>>>>>>> [***@mseas-data2 ~]#
>>>>>>>>>>> gluster volume info
>>>>>>>>>>>
>>>>>>>>>>> Volume Name:
>>>>>>>>>>> data-volume
>>>>>>>>>>> Type: Distribute
>>>>>>>>>>> Volume ID:
>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>> Status: Started
>>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>> Bricks:
>>>>>>>>>>> Brick1:
>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>> Brick2:
>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>> diagnostics.count-fop-hits:
>>>>>>>>>>> on
>>>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>>>> on
>>>>>>>>>>> nfs.exports-auth-enable:
>>>>>>>>>>> on
>>>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>>>> WARNING
>>>>>>>>>>> performance.readdir-ahead:
>>>>>>>>>>> on
>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/10/2017 11:44
>>>>>>>>>>> AM, Pranith Kumar
>>>>>>>>>>> Karampuri wrote:
>>>>>>>>>>>> Is this the volume
>>>>>>>>>>>> info you have?
>>>>>>>>>>>>
>>>>>>>>>>>> >/[root at mseas-data2
>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>>> ~]# gluster volume
>>>>>>>>>>>> info />//>/Volume Name:
>>>>>>>>>>>> data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>> />/Status: Started />/Number of Bricks: 2
>>>>>>>>>>>> />/Transport-type: tcp />/Bricks: />/Brick1:
>>>>>>>>>>>> mseas-data2:/mnt/brick1 />/Brick2:
>>>>>>>>>>>> mseas-data2:/mnt/brick2 />/Options Reconfigured:
>>>>>>>>>>>> />/performance.readdir-ahead:
>>>>>>>>>>>> on />/nfs.disable:
>>>>>>>>>>>> on />/nfs.export-volumes: off /
>>>>>>>>>>>> ​I copied this from
>>>>>>>>>>>> old thread from 2016.
>>>>>>>>>>>> This is distribute
>>>>>>>>>>>> volume. Did you
>>>>>>>>>>>> change any of the
>>>>>>>>>>>> options in between?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>
>>>>>>>>>>> Pat
>>>>>>>>>>> Haley Email: ***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>> Center for Ocean
>>>>>>>>>>> Engineering Phone: (617) 253-6824
>>>>>>>>>>> Dept. of Mechanical
>>>>>>>>>>> Engineering Fax: (617) 253-8125
>>>>>>>>>>> MIT, Room 5-213
>>>>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Pranith
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>
>>>>>>>>>> Pat
>>>>>>>>>> Haley Email: ***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>> Center for Ocean
>>>>>>>>>> Engineering Phone: (617) 253-6824
>>>>>>>>>> Dept. of Mechanical
>>>>>>>>>> Engineering Fax: (617) 253-8125
>>>>>>>>>> MIT, Room 5-213
>>>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pranith
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat
>>>>>>>>> Haley Email: ***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>> Center for Ocean
>>>>>>>>> Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical
>>>>>>>>> Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213
>>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley
>>>>>>>> Email: ***@mit.edu <mailto:***@mit.edu>
>>>>>>>> Center for Ocean Engineering
>>>>>>>> Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering
>>>>>>>> Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213
>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley
>>>>>>> Email: ***@mit.edu <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering
>>>>>>> Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering
>>>>>>> Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213
>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617)
>>>>>> 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617)
>>>>>> 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:
>>>>> ***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617)
>>>>> 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617)
>>>>> 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:
>>>> ***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>>
>>
>> --
>> Pranith

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pat Haley
2017-05-31 14:37:22 UTC
Reply
Permalink
Raw Message
Hi Soumya,

What pattern should we be trying to view with the tcpump? Is a one
minute capture of a copy operation sufficient or are you looking for
something else?

Pat


On 05/31/2017 06:56 AM, Soumya Koduri wrote:
>
>
> On 05/31/2017 07:24 AM, Pranith Kumar Karampuri wrote:
>> Thanks this is good information.
>>
>> +Soumya
>>
>> Soumya,
>> We are trying to find why kNFS is performing way better than
>> plain distribute glusterfs+fuse. What information do you think will
>> benefit us to compare the operations with kNFS vs gluster+fuse? We
>> already have profile output from fuse.
>>
> Could be because all operations done by kNFS are local to the system.
> The operations done by FUSE mount over network could be more in number
> and time-consuming than the ones sent by NFS-client. We could compare
> and examine the pattern from tcpump taken over fuse-mount and
> NFS-mount. Also nfsstat [1] may give some clue.
>
> Sorry I hadn't followed this mail from the beginning. But is this
> comparison between single brick volume and kNFS exporting that brick?
> Otherwise its not a fair comparison if the volume is replicated or
> distributed.
>
> Thanks,
> Soumya
>
> [1] https://linux.die.net/man/8/nfsstat
>
>>
>> On Wed, May 31, 2017 at 7:10 AM, Pat Haley <***@mit.edu
>> <mailto:***@mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> The "dd" command was:
>>
>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>>
>> There were 2 instances where dd reported 22 seconds. The output from
>> the dd tests are in
>>
>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt>
>>
>> Pat
>>
>>
>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>>> Pat,
>>> What is the command you used? As per the following output,
>>> it seems like at least one write operation took 16 seconds. Which
>>> is really bad.
>>> 96.39 1165.10 us 89.00 us *16487014.00 us*
>>> 393212 WRITE
>>>
>>>
>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> I ran the same 'dd' test both in the gluster test volume and
>>> in the .glusterfs directory of each brick. The median results
>>> (12 dd trials in each test) are similar to before
>>>
>>> * gluster test volume: 586.5 MB/s
>>> * bricks (in .glusterfs): 1.4 GB/s
>>>
>>> The profile for the gluster test-volume is in
>>>
>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>>> Let's start with the same 'dd' test we were testing with to
>>>> see, what the numbers are. Please provide profile numbers for
>>>> the same. From there on we will start tuning the volume to
>>>> see what we can do.
>>>>
>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
>>>> <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Thanks for the tip. We now have the gluster volume
>>>> mounted under /home. What tests do you recommend we run?
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley
>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Sorry for the delay. I never saw received your
>>>>> reply (but I did receive Ben Turner's follow-up to
>>>>> your reply). So we tried to create a gluster volume
>>>>> under /home using different variations of
>>>>>
>>>>> gluster volume create test-volume
>>>>> mseas-data2:/home/gbrick_test_1
>>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>>
>>>>> However we keep getting errors of the form
>>>>>
>>>>> Wrong brick type: transport, use
>>>>> <HOSTNAME>:<export-dir-abs-path>
>>>>>
>>>>> Any thoughts on what we're doing wrong?
>>>>>
>>>>>
>>>>> You should give transport tcp at the beginning I think.
>>>>> Anyways, transport tcp is the default, so no need to
>>>>> specify so remove those two words from the CLI.
>>>>>
>>>>>
>>>>> Also do you have a list of the test we should be
>>>>> running once we get this volume created? Given the
>>>>> time-zone difference it might help if we can run a
>>>>> small battery of tests and post the results rather
>>>>> than test-post-new test-post... .
>>>>>
>>>>>
>>>>> This is the first time I am doing performance analysis
>>>>> on users as far as I remember. In our team there are
>>>>> separate engineers who do these tests. Ben who replied
>>>>> earlier is one such engineer.
>>>>>
>>>>> Ben,
>>>>> Have any suggestions?
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> The /home partition is mounted as ext4
>>>>>> /home ext4
>>>>>> defaults,usrquota,grpquota 1 2
>>>>>>
>>>>>> The brick partitions are mounted ax xfs
>>>>>> /mnt/brick1 xfs defaults 0 0
>>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>>
>>>>>> Will this cause a problem with creating a
>>>>>> volume under /home?
>>>>>>
>>>>>>
>>>>>> I don't think the bottleneck is disk. You can do
>>>>>> the same tests you did on your new volume to
>>>>>> confirm?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Unfortunately, we don't have similar
>>>>>>> hardware for a small scale test. All we
>>>>>>> have is our production hardware.
>>>>>>>
>>>>>>>
>>>>>>> You said something about /home partition which
>>>>>>> has lesser disks, we can create plain
>>>>>>> distribute volume inside one of those
>>>>>>> directories. After we are done, we can remove
>>>>>>> the setup. What do you say?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>>>> Karampuri wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat
>>>>>>>> Haley <***@mit.edu
>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> Since we are mounting the partitions
>>>>>>>> as the bricks, I tried the dd test
>>>>>>>> writing to
>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>> The results without oflag=sync were
>>>>>>>> 1.6 Gb/s (faster than gluster but not
>>>>>>>> as fast as I was expecting given the
>>>>>>>> 1.2 Gb/s to the no-gluster area w/
>>>>>>>> fewer disks).
>>>>>>>>
>>>>>>>>
>>>>>>>> Okay, then 1.6Gb/s is what we need to
>>>>>>>> target for, considering your volume is
>>>>>>>> just distribute. Is there any way you can
>>>>>>>> do tests on similar hardware but at a
>>>>>>>> small scale? Just so we can run the
>>>>>>>> workload to learn more about the
>>>>>>>> bottlenecks in the system? We can
>>>>>>>> probably try to get the speed to 1.2Gb/s
>>>>>>>> on your /home partition you were telling
>>>>>>>> me yesterday. Let me know if that is
>>>>>>>> something you are okay to do.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>>>> Karampuri wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM,
>>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Pranith,
>>>>>>>>>
>>>>>>>>> Not entirely sure (this isn't my
>>>>>>>>> area of expertise). I'll run
>>>>>>>>> your answer by some other people
>>>>>>>>> who are more familiar with this.
>>>>>>>>>
>>>>>>>>> I am also uncertain about how to
>>>>>>>>> interpret the results when we
>>>>>>>>> also add the dd tests writing to
>>>>>>>>> the /home area (no gluster,
>>>>>>>>> still on the same machine)
>>>>>>>>>
>>>>>>>>> * dd test without oflag=sync
>>>>>>>>> (rough average of multiple
>>>>>>>>> tests)
>>>>>>>>> o gluster w/ fuse mount :
>>>>>>>>> 570 Mb/s
>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>> 390 Mb/s
>>>>>>>>> o nfs (no gluster): 1.2
>>>>>>>>> Gb/s
>>>>>>>>> * dd test with oflag=sync
>>>>>>>>> (rough average of multiple
>>>>>>>>> tests)
>>>>>>>>> o gluster w/ fuse mount:
>>>>>>>>> 5 Mb/s
>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>> 200 Mb/s
>>>>>>>>> o nfs (no gluster): 20 Mb/s
>>>>>>>>>
>>>>>>>>> Given that the non-gluster area
>>>>>>>>> is a RAID-6 of 4 disks while
>>>>>>>>> each brick of the gluster area
>>>>>>>>> is a RAID-6 of 32 disks, I would
>>>>>>>>> naively expect the writes to the
>>>>>>>>> gluster area to be roughly 8x
>>>>>>>>> faster than to the non-gluster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think a better test is to try and
>>>>>>>>> write to a file using nfs without
>>>>>>>>> any gluster to a location that is
>>>>>>>>> not inside the brick but someother
>>>>>>>>> location that is on same disk(s). If
>>>>>>>>> you are mounting the partition as
>>>>>>>>> the brick, then we can write to a
>>>>>>>>> file inside .glusterfs directory,
>>>>>>>>> something like
>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I still think we have a speed
>>>>>>>>> issue, I can't tell if fuse vs
>>>>>>>>> nfs is part of the problem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I got interested in the post because
>>>>>>>>> I read that fuse speed is lesser
>>>>>>>>> than nfs speed which is
>>>>>>>>> counter-intuitive to my
>>>>>>>>> understanding. So wanted
>>>>>>>>> clarifications. Now that I got my
>>>>>>>>> clarifications where fuse
>>>>>>>>> outperformed nfs without sync, we
>>>>>>>>> can resume testing as described
>>>>>>>>> above and try to find what it is.
>>>>>>>>> Based on your email-id I am guessing
>>>>>>>>> you are from Boston and I am from
>>>>>>>>> Bangalore so if you are okay with
>>>>>>>>> doing this debugging for multiple
>>>>>>>>> days because of timezones, I will be
>>>>>>>>> happy to help. Please be a bit
>>>>>>>>> patient with me, I am under a
>>>>>>>>> release crunch but I am very curious
>>>>>>>>> with the problem you posted.
>>>>>>>>>
>>>>>>>>> Was there anything useful in
>>>>>>>>> the profiles?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unfortunately profiles didn't help
>>>>>>>>> me much, I think we are collecting
>>>>>>>>> the profiles from an active volume,
>>>>>>>>> so it has a lot of information that
>>>>>>>>> is not pertaining to dd so it is
>>>>>>>>> difficult to find the contributions
>>>>>>>>> of dd. So I went through your post
>>>>>>>>> again and found something I didn't
>>>>>>>>> pay much attention to earlier i.e.
>>>>>>>>> oflag=sync, so did my own tests on
>>>>>>>>> my setup with FUSE so sent that
>>>>>>>>> reply.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Pat
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>> Okay good. At least this
>>>>>>>>>> validates my doubts. Handling
>>>>>>>>>> O_SYNC in gluster NFS and fuse
>>>>>>>>>> is a bit different.
>>>>>>>>>> When application opens a file
>>>>>>>>>> with O_SYNC on fuse mount then
>>>>>>>>>> each write syscall has to be
>>>>>>>>>> written to disk as part of the
>>>>>>>>>> syscall where as in case of
>>>>>>>>>> NFS, there is no concept of
>>>>>>>>>> open. NFS performs write though
>>>>>>>>>> a handle saying it needs to be
>>>>>>>>>> a synchronous write, so write()
>>>>>>>>>> syscall is performed first then
>>>>>>>>>> it performs fsync(). so an
>>>>>>>>>> write on an fd with O_SYNC
>>>>>>>>>> becomes write+fsync. I am
>>>>>>>>>> suspecting that when multiple
>>>>>>>>>> threads do this write+fsync()
>>>>>>>>>> operation on the same file,
>>>>>>>>>> multiple writes are batched
>>>>>>>>>> together to be written do disk
>>>>>>>>>> so the throughput on the disk
>>>>>>>>>> is increasing is my guess.
>>>>>>>>>>
>>>>>>>>>> Does it answer your doubts?
>>>>>>>>>>
>>>>>>>>>> On Wed, May 10, 2017 at 9:35
>>>>>>>>>> PM, Pat Haley <***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Without the oflag=sync and
>>>>>>>>>> only a single test of each,
>>>>>>>>>> the FUSE is going faster
>>>>>>>>>> than NFS:
>>>>>>>>>>
>>>>>>>>>> FUSE:
>>>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>> conv=sync
>>>>>>>>>> 4096+0 records in
>>>>>>>>>> 4096+0 records out
>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> NFS
>>>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>> conv=sync
>>>>>>>>>> 4096+0 records in
>>>>>>>>>> 4096+0 records out
>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/10/2017 11:53 AM,
>>>>>>>>>> Pranith Kumar Karampuri
>>>>>>>>>> wrote:
>>>>>>>>>>> Could you let me know the
>>>>>>>>>>> speed without oflag=sync
>>>>>>>>>>> on both the mounts? No
>>>>>>>>>>> need to collect profiles.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 10, 2017 at
>>>>>>>>>>> 9:17 PM, Pat Haley
>>>>>>>>>>> <***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is what I see now:
>>>>>>>>>>>
>>>>>>>>>>> [***@mseas-data2 ~]#
>>>>>>>>>>> gluster volume info
>>>>>>>>>>>
>>>>>>>>>>> Volume Name:
>>>>>>>>>>> data-volume
>>>>>>>>>>> Type: Distribute
>>>>>>>>>>> Volume ID:
>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>> Status: Started
>>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>> Bricks:
>>>>>>>>>>> Brick1:
>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>> Brick2:
>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>> diagnostics.count-fop-hits:
>>>>>>>>>>> on
>>>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>>>> on
>>>>>>>>>>> nfs.exports-auth-enable:
>>>>>>>>>>> on
>>>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>>>> WARNING
>>>>>>>>>>> performance.readdir-ahead:
>>>>>>>>>>> on
>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/10/2017 11:44
>>>>>>>>>>> AM, Pranith Kumar
>>>>>>>>>>> Karampuri wrote:
>>>>>>>>>>>> Is this the volume
>>>>>>>>>>>> info you have?
>>>>>>>>>>>>
>>>>>>>>>>>> >/[root at mseas-data2
>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>>> ~]# gluster volume
>>>>>>>>>>>> info />//>/Volume Name:
>>>>>>>>>>>> data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>> />/Status: Started />/Number of Bricks: 2
>>>>>>>>>>>> />/Transport-type: tcp />/Bricks: />/Brick1:
>>>>>>>>>>>> mseas-data2:/mnt/brick1 />/Brick2:
>>>>>>>>>>>> mseas-data2:/mnt/brick2 />/Options Reconfigured:
>>>>>>>>>>>> />/performance.readdir-ahead:
>>>>>>>>>>>> on />/nfs.disable:
>>>>>>>>>>>> on />/nfs.export-volumes: off /
>>>>>>>>>>>> ​I copied this from
>>>>>>>>>>>> old thread from 2016.
>>>>>>>>>>>> This is distribute
>>>>>>>>>>>> volume. Did you
>>>>>>>>>>>> change any of the
>>>>>>>>>>>> options in between?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>
>>>>>>>>>>> Pat
>>>>>>>>>>> Haley Email: ***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>> Center for Ocean
>>>>>>>>>>> Engineering Phone: (617) 253-6824
>>>>>>>>>>> Dept. of Mechanical
>>>>>>>>>>> Engineering Fax: (617) 253-8125
>>>>>>>>>>> MIT, Room 5-213
>>>>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Pranith
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>
>>>>>>>>>> Pat
>>>>>>>>>> Haley Email: ***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>> Center for Ocean
>>>>>>>>>> Engineering Phone: (617) 253-6824
>>>>>>>>>> Dept. of Mechanical
>>>>>>>>>> Engineering Fax: (617) 253-8125
>>>>>>>>>> MIT, Room 5-213
>>>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pranith
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat
>>>>>>>>> Haley Email: ***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>> Center for Ocean
>>>>>>>>> Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical
>>>>>>>>> Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213
>>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley
>>>>>>>> Email: ***@mit.edu <mailto:***@mit.edu>
>>>>>>>> Center for Ocean Engineering
>>>>>>>> Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering
>>>>>>>> Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213
>>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley
>>>>>>> Email: ***@mit.edu <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering
>>>>>>> Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering
>>>>>>> Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213
>>>>>>> http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617)
>>>>>> 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617)
>>>>>> 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:
>>>>> ***@mit.edu <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617)
>>>>> 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617)
>>>>> 253-8125
>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:
>>>> ***@mit.edu <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: ***@mit.edu
>>> <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> <mailto:***@mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>>
>>
>> --
>> Pranith

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Ben Turner
2017-06-02 05:07:28 UTC
Reply
Permalink
Raw Message
Are you sure using conv=sync is what you want? I normally use conv=fdatasync, I'll look up the difference between the two and see if it affects your test.


-b

----- Original Message -----
> From: "Pat Haley" <***@mit.edu>
> To: "Pranith Kumar Karampuri" <***@redhat.com>
> Cc: "Ravishankar N" <***@redhat.com>, gluster-***@gluster.org, "Steve Postma" <***@ztechnet.com>, "Ben
> Turner" <***@redhat.com>
> Sent: Tuesday, May 30, 2017 9:40:34 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
>
>
> Hi Pranith,
>
> The "dd" command was:
>
> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>
> There were 2 instances where dd reported 22 seconds. The output from the
> dd tests are in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>
> Pat
>
> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> > Pat,
> > What is the command you used? As per the following output, it
> > seems like at least one write operation took 16 seconds. Which is
> > really bad.
> > 96.39 1165.10 us 89.00 us*16487014.00 us* 393212
> > WRITE
> >
> >
> > On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
> > <mailto:***@mit.edu>> wrote:
> >
> >
> > Hi Pranith,
> >
> > I ran the same 'dd' test both in the gluster test volume and in
> > the .glusterfs directory of each brick. The median results (12 dd
> > trials in each test) are similar to before
> >
> > * gluster test volume: 586.5 MB/s
> > * bricks (in .glusterfs): 1.4 GB/s
> >
> > The profile for the gluster test-volume is in
> >
> > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> > <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
> >
> > Thanks
> >
> > Pat
> >
> >
> >
> >
> > On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> >> Let's start with the same 'dd' test we were testing with to see,
> >> what the numbers are. Please provide profile numbers for the
> >> same. From there on we will start tuning the volume to see what
> >> we can do.
> >>
> >> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
> >> <mailto:***@mit.edu>> wrote:
> >>
> >>
> >> Hi Pranith,
> >>
> >> Thanks for the tip. We now have the gluster volume mounted
> >> under /home. What tests do you recommend we run?
> >>
> >> Thanks
> >>
> >> Pat
> >>
> >>
> >>
> >> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
> >>>
> >>>
> >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu
> >>> <mailto:***@mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Sorry for the delay. I never saw received your reply
> >>> (but I did receive Ben Turner's follow-up to your
> >>> reply). So we tried to create a gluster volume under
> >>> /home using different variations of
> >>>
> >>> gluster volume create test-volume
> >>> mseas-data2:/home/gbrick_test_1
> >>> mseas-data2:/home/gbrick_test_2 transport tcp
> >>>
> >>> However we keep getting errors of the form
> >>>
> >>> Wrong brick type: transport, use
> >>> <HOSTNAME>:<export-dir-abs-path>
> >>>
> >>> Any thoughts on what we're doing wrong?
> >>>
> >>>
> >>> You should give transport tcp at the beginning I think.
> >>> Anyways, transport tcp is the default, so no need to specify
> >>> so remove those two words from the CLI.
> >>>
> >>>
> >>> Also do you have a list of the test we should be running
> >>> once we get this volume created? Given the time-zone
> >>> difference it might help if we can run a small battery
> >>> of tests and post the results rather than test-post-new
> >>> test-post... .
> >>>
> >>>
> >>> This is the first time I am doing performance analysis on
> >>> users as far as I remember. In our team there are separate
> >>> engineers who do these tests. Ben who replied earlier is one
> >>> such engineer.
> >>>
> >>> Ben,
> >>> Have any suggestions?
> >>>
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
> >>>>
> >>>>
> >>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
> >>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
> >>>>
> >>>>
> >>>> Hi Pranith,
> >>>>
> >>>> The /home partition is mounted as ext4
> >>>> /home ext4 defaults,usrquota,grpquota 1 2
> >>>>
> >>>> The brick partitions are mounted ax xfs
> >>>> /mnt/brick1 xfs defaults 0 0
> >>>> /mnt/brick2 xfs defaults 0 0
> >>>>
> >>>> Will this cause a problem with creating a volume
> >>>> under /home?
> >>>>
> >>>>
> >>>> I don't think the bottleneck is disk. You can do the
> >>>> same tests you did on your new volume to confirm?
> >>>>
> >>>>
> >>>> Pat
> >>>>
> >>>>
> >>>>
> >>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
> >>>>>
> >>>>>
> >>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
> >>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
> >>>>>
> >>>>>
> >>>>> Hi Pranith,
> >>>>>
> >>>>> Unfortunately, we don't have similar hardware
> >>>>> for a small scale test. All we have is our
> >>>>> production hardware.
> >>>>>
> >>>>>
> >>>>> You said something about /home partition which has
> >>>>> lesser disks, we can create plain distribute
> >>>>> volume inside one of those directories. After we
> >>>>> are done, we can remove the setup. What do you say?
> >>>>>
> >>>>>
> >>>>> Pat
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/11/2017 07:05 AM, Pranith Kumar
> >>>>> Karampuri wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
> >>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi Pranith,
> >>>>>>
> >>>>>> Since we are mounting the partitions as
> >>>>>> the bricks, I tried the dd test writing
> >>>>>> to
> >>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>>>>> The results without oflag=sync were 1.6
> >>>>>> Gb/s (faster than gluster but not as fast
> >>>>>> as I was expecting given the 1.2 Gb/s to
> >>>>>> the no-gluster area w/ fewer disks).
> >>>>>>
> >>>>>>
> >>>>>> Okay, then 1.6Gb/s is what we need to target
> >>>>>> for, considering your volume is just
> >>>>>> distribute. Is there any way you can do tests
> >>>>>> on similar hardware but at a small scale?
> >>>>>> Just so we can run the workload to learn more
> >>>>>> about the bottlenecks in the system? We can
> >>>>>> probably try to get the speed to 1.2Gb/s on
> >>>>>> your /home partition you were telling me
> >>>>>> yesterday. Let me know if that is something
> >>>>>> you are okay to do.
> >>>>>>
> >>>>>>
> >>>>>> Pat
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
> >>>>>> Karampuri wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat
> >>>>>>> Haley <***@mit.edu
> >>>>>>> <mailto:***@mit.edu>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi Pranith,
> >>>>>>>
> >>>>>>> Not entirely sure (this isn't my
> >>>>>>> area of expertise). I'll run your
> >>>>>>> answer by some other people who are
> >>>>>>> more familiar with this.
> >>>>>>>
> >>>>>>> I am also uncertain about how to
> >>>>>>> interpret the results when we also
> >>>>>>> add the dd tests writing to the
> >>>>>>> /home area (no gluster, still on the
> >>>>>>> same machine)
> >>>>>>>
> >>>>>>> * dd test without oflag=sync
> >>>>>>> (rough average of multiple tests)
> >>>>>>> o gluster w/ fuse mount : 570 Mb/s
> >>>>>>> o gluster w/ nfs mount: 390 Mb/s
> >>>>>>> o nfs (no gluster): 1.2 Gb/s
> >>>>>>> * dd test with oflag=sync (rough
> >>>>>>> average of multiple tests)
> >>>>>>> o gluster w/ fuse mount: 5 Mb/s
> >>>>>>> o gluster w/ nfs mount: 200 Mb/s
> >>>>>>> o nfs (no gluster): 20 Mb/s
> >>>>>>>
> >>>>>>> Given that the non-gluster area is a
> >>>>>>> RAID-6 of 4 disks while each brick
> >>>>>>> of the gluster area is a RAID-6 of
> >>>>>>> 32 disks, I would naively expect the
> >>>>>>> writes to the gluster area to be
> >>>>>>> roughly 8x faster than to the
> >>>>>>> non-gluster.
> >>>>>>>
> >>>>>>>
> >>>>>>> I think a better test is to try and
> >>>>>>> write to a file using nfs without any
> >>>>>>> gluster to a location that is not inside
> >>>>>>> the brick but someother location that is
> >>>>>>> on same disk(s). If you are mounting the
> >>>>>>> partition as the brick, then we can
> >>>>>>> write to a file inside .glusterfs
> >>>>>>> directory, something like
> >>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I still think we have a speed issue,
> >>>>>>> I can't tell if fuse vs nfs is part
> >>>>>>> of the problem.
> >>>>>>>
> >>>>>>>
> >>>>>>> I got interested in the post because I
> >>>>>>> read that fuse speed is lesser than nfs
> >>>>>>> speed which is counter-intuitive to my
> >>>>>>> understanding. So wanted clarifications.
> >>>>>>> Now that I got my clarifications where
> >>>>>>> fuse outperformed nfs without sync, we
> >>>>>>> can resume testing as described above
> >>>>>>> and try to find what it is. Based on
> >>>>>>> your email-id I am guessing you are from
> >>>>>>> Boston and I am from Bangalore so if you
> >>>>>>> are okay with doing this debugging for
> >>>>>>> multiple days because of timezones, I
> >>>>>>> will be happy to help. Please be a bit
> >>>>>>> patient with me, I am under a release
> >>>>>>> crunch but I am very curious with the
> >>>>>>> problem you posted.
> >>>>>>>
> >>>>>>> Was there anything useful in the
> >>>>>>> profiles?
> >>>>>>>
> >>>>>>>
> >>>>>>> Unfortunately profiles didn't help me
> >>>>>>> much, I think we are collecting the
> >>>>>>> profiles from an active volume, so it
> >>>>>>> has a lot of information that is not
> >>>>>>> pertaining to dd so it is difficult to
> >>>>>>> find the contributions of dd. So I went
> >>>>>>> through your post again and found
> >>>>>>> something I didn't pay much attention to
> >>>>>>> earlier i.e. oflag=sync, so did my own
> >>>>>>> tests on my setup with FUSE so sent that
> >>>>>>> reply.
> >>>>>>>
> >>>>>>>
> >>>>>>> Pat
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05/10/2017 12:15 PM, Pranith
> >>>>>>> Kumar Karampuri wrote:
> >>>>>>>> Okay good. At least this validates
> >>>>>>>> my doubts. Handling O_SYNC in
> >>>>>>>> gluster NFS and fuse is a bit
> >>>>>>>> different.
> >>>>>>>> When application opens a file with
> >>>>>>>> O_SYNC on fuse mount then each
> >>>>>>>> write syscall has to be written to
> >>>>>>>> disk as part of the syscall where
> >>>>>>>> as in case of NFS, there is no
> >>>>>>>> concept of open. NFS performs write
> >>>>>>>> though a handle saying it needs to
> >>>>>>>> be a synchronous write, so write()
> >>>>>>>> syscall is performed first then it
> >>>>>>>> performs fsync(). so an write on an
> >>>>>>>> fd with O_SYNC becomes write+fsync.
> >>>>>>>> I am suspecting that when multiple
> >>>>>>>> threads do this write+fsync()
> >>>>>>>> operation on the same file,
> >>>>>>>> multiple writes are batched
> >>>>>>>> together to be written do disk so
> >>>>>>>> the throughput on the disk is
> >>>>>>>> increasing is my guess.
> >>>>>>>>
> >>>>>>>> Does it answer your doubts?
> >>>>>>>>
> >>>>>>>> On Wed, May 10, 2017 at 9:35 PM,
> >>>>>>>> Pat Haley <***@mit.edu
> >>>>>>>> <mailto:***@mit.edu>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Without the oflag=sync and only
> >>>>>>>> a single test of each, the FUSE
> >>>>>>>> is going faster than NFS:
> >>>>>>>>
> >>>>>>>> FUSE:
> >>>>>>>> mseas-data2(dri_nascar)% dd
> >>>>>>>> if=/dev/zero count=4096
> >>>>>>>> bs=1048576 of=zeros.txt conv=sync
> >>>>>>>> 4096+0 records in
> >>>>>>>> 4096+0 records out
> >>>>>>>> 4294967296 bytes (4.3 GB)
> >>>>>>>> copied, 7.46961 s, 575 MB/s
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> NFS
> >>>>>>>> mseas-data2(HYCOM)% dd
> >>>>>>>> if=/dev/zero count=4096
> >>>>>>>> bs=1048576 of=zeros.txt conv=sync
> >>>>>>>> 4096+0 records in
> >>>>>>>> 4096+0 records out
> >>>>>>>> 4294967296 bytes (4.3 GB)
> >>>>>>>> copied, 11.4264 s, 376 MB/s
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 05/10/2017 11:53 AM, Pranith
> >>>>>>>> Kumar Karampuri wrote:
> >>>>>>>>> Could you let me know the
> >>>>>>>>> speed without oflag=sync on
> >>>>>>>>> both the mounts? No need to
> >>>>>>>>> collect profiles.
> >>>>>>>>>
> >>>>>>>>> On Wed, May 10, 2017 at 9:17
> >>>>>>>>> PM, Pat Haley <***@mit.edu
> >>>>>>>>> <mailto:***@mit.edu>> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Here is what I see now:
> >>>>>>>>>
> >>>>>>>>> [***@mseas-data2 ~]#
> >>>>>>>>> gluster volume info
> >>>>>>>>>
> >>>>>>>>> Volume Name: data-volume
> >>>>>>>>> Type: Distribute
> >>>>>>>>> Volume ID:
> >>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>>>>>>>> Status: Started
> >>>>>>>>> Number of Bricks: 2
> >>>>>>>>> Transport-type: tcp
> >>>>>>>>> Bricks:
> >>>>>>>>> Brick1:
> >>>>>>>>> mseas-data2:/mnt/brick1
> >>>>>>>>> Brick2:
> >>>>>>>>> mseas-data2:/mnt/brick2
> >>>>>>>>> Options Reconfigured:
> >>>>>>>>> diagnostics.count-fop-hits: on
> >>>>>>>>> diagnostics.latency-measurement:
> >>>>>>>>> on
> >>>>>>>>> nfs.exports-auth-enable: on
> >>>>>>>>> diagnostics.brick-sys-log-level:
> >>>>>>>>> WARNING
> >>>>>>>>> performance.readdir-ahead: on
> >>>>>>>>> nfs.disable: on
> >>>>>>>>> nfs.export-volumes: off
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 05/10/2017 11:44 AM,
> >>>>>>>>> Pranith Kumar Karampuri wrote:
> >>>>>>>>>> Is this the volume info
> >>>>>>>>>> you have?
> >>>>>>>>>>
> >>>>>>>>>> >/[root at mseas-data2
> >>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
> >>>>>>>>>> ~]# gluster volume info
> >>>>>>>>>> />//>/Volume Name:
> >>>>>>>>>> data-volume />/Type:
> >>>>>>>>>> Distribute />/Volume ID:
> >>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>>>>>>>>> />/Status: Started />/Number
> >>>>>>>>>> of Bricks: 2
> >>>>>>>>>> />/Transport-type: tcp
> >>>>>>>>>> />/Bricks: />/Brick1:
> >>>>>>>>>> mseas-data2:/mnt/brick1
> >>>>>>>>>> />/Brick2:
> >>>>>>>>>> mseas-data2:/mnt/brick2
> >>>>>>>>>> />/Options Reconfigured:
> >>>>>>>>>> />/performance.readdir-ahead:
> >>>>>>>>>> on />/nfs.disable: on
> >>>>>>>>>> />/nfs.export-volumes: off /
> >>>>>>>>>> ​I copied this from old
> >>>>>>>>>> thread from 2016. This is
> >>>>>>>>>> distribute volume. Did
> >>>>>>>>>> you change any of the
> >>>>>>>>>> options in between?
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>>>> Pat Haley
> >>>>>>>>> Email:***@mit.edu
> >>>>>>>>> <mailto:***@mit.edu>
> >>>>>>>>> Center for Ocean Engineering
> >>>>>>>>> Phone: (617) 253-6824
> >>>>>>>>> Dept. of Mechanical Engineering
> >>>>>>>>> Fax: (617) 253-8125
> >>>>>>>>> MIT, Room
> >>>>>>>>> 5-213http://web.mit.edu/phaley/www/
> >>>>>>>>> 77 Massachusetts Avenue
> >>>>>>>>> Cambridge, MA 02139-4301
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Pranith
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>>> Pat Haley
> >>>>>>>> Email:***@mit.edu
> >>>>>>>> <mailto:***@mit.edu>
> >>>>>>>> Center for Ocean Engineering
> >>>>>>>> Phone: (617) 253-6824
> >>>>>>>> Dept. of Mechanical Engineering
> >>>>>>>> Fax: (617) 253-8125
> >>>>>>>> MIT, Room
> >>>>>>>> 5-213http://web.mit.edu/phaley/www/
> >>>>>>>> 77 Massachusetts Avenue
> >>>>>>>> Cambridge, MA 02139-4301
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Pranith
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>> Pat Haley
> >>>>>>> Email:***@mit.edu
> >>>>>>> <mailto:***@mit.edu>
> >>>>>>> Center for Ocean Engineering Phone:
> >>>>>>> (617) 253-6824
> >>>>>>> Dept. of Mechanical Engineering Fax:
> >>>>>>> (617) 253-8125
> >>>>>>> MIT, Room
> >>>>>>> 5-213http://web.mit.edu/phaley/www/
> >>>>>>> 77 Massachusetts Avenue
> >>>>>>> Cambridge, MA 02139-4301
> >>>>>>>
> >>>>>>> --
> >>>>>>> Pranith
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>> Pat Haley
> >>>>>> Email:***@mit.edu
> >>>>>> <mailto:***@mit.edu>
> >>>>>> Center for Ocean Engineering Phone:
> >>>>>> (617) 253-6824
> >>>>>> Dept. of Mechanical Engineering Fax:
> >>>>>> (617) 253-8125
> >>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>>>> 77 Massachusetts Avenue
> >>>>>> Cambridge, MA 02139-4301
> >>>>>>
> >>>>>> --
> >>>>>> Pranith
> >>>>>
> >>>>> --
> >>>>>
> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>> Pat Haley
> >>>>> Email:***@mit.edu
> >>>>> <mailto:***@mit.edu>
> >>>>> Center for Ocean Engineering Phone: (617)
> >>>>> 253-6824
> >>>>> Dept. of Mechanical Engineering Fax: (617)
> >>>>> 253-8125
> >>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>>> 77 Massachusetts Avenue
> >>>>> Cambridge, MA 02139-4301
> >>>>>
> >>>>> --
> >>>>> Pranith
> >>>>
> >>>> --
> >>>>
> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>> Pat Haley Email:***@mit.edu
> >>>> <mailto:***@mit.edu>
> >>>> Center for Ocean Engineering Phone: (617)
> >>>> 253-6824
> >>>> Dept. of Mechanical Engineering Fax: (617)
> >>>> 253-8125
> >>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>> 77 Massachusetts Avenue
> >>>> Cambridge, MA 02139-4301
> >>>>
> >>>> --
> >>>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email:***@mit.edu
> >>> <mailto:***@mit.edu>
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >>
> >> --
> >>
> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >> Pat Haley Email:***@mit.edu
> >> <mailto:***@mit.edu>
> >> Center for Ocean Engineering Phone: (617) 253-6824
> >> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >> 77 Massachusetts Avenue
> >> Cambridge, MA 02139-4301
> >>
> >>
> >>
> >>
> >> --
> >> Pranith
> >
> > --
> >
> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> > Pat Haley Email:***@mit.edu
> > <mailto:***@mit.edu>
> > Center for Ocean Engineering Phone: (617) 253-6824
> > Dept. of Mechanical Engineering Fax: (617) 253-8125
> > MIT, Room 5-213http://web.mit.edu/phaley/www/
> > 77 Massachusetts Avenue
> > Cambridge, MA 02139-4301
> >
> >
> >
> >
> > --
> > Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
Pat Haley
2017-06-12 18:35:41 UTC
Reply
Permalink
Raw Message
Hi Guys,

I was wondering what our next steps should be to solve the slow write times.

Recently I was debugging a large code and writing a lot of output at
every time step. When I tried writing to our gluster disks, it was
taking over a day to do a single time step whereas if I had the same
program (same hardware, network) write to our nfs disk the time per
time-step was about 45 minutes. What we are shooting for here would be
to have similar times to either gluster of nfs.

Thanks

Pat


On 06/02/2017 01:07 AM, Ben Turner wrote:
> Are you sure using conv=sync is what you want? I normally use conv=fdatasync, I'll look up the difference between the two and see if it affects your test.
>
>
> -b
>
> ----- Original Message -----
>> From: "Pat Haley" <***@mit.edu>
>> To: "Pranith Kumar Karampuri" <***@redhat.com>
>> Cc: "Ravishankar N" <***@redhat.com>, gluster-***@gluster.org, "Steve Postma" <***@ztechnet.com>, "Ben
>> Turner" <***@redhat.com>
>> Sent: Tuesday, May 30, 2017 9:40:34 PM
>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>
>>
>> Hi Pranith,
>>
>> The "dd" command was:
>>
>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>>
>> There were 2 instances where dd reported 22 seconds. The output from the
>> dd tests are in
>>
>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>
>> Pat
>>
>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>>> Pat,
>>> What is the command you used? As per the following output, it
>>> seems like at least one write operation took 16 seconds. Which is
>>> really bad.
>>> 96.39 1165.10 us 89.00 us*16487014.00 us* 393212
>>> WRITE
>>>
>>>
>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
>>> <mailto:***@mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> I ran the same 'dd' test both in the gluster test volume and in
>>> the .glusterfs directory of each brick. The median results (12 dd
>>> trials in each test) are similar to before
>>>
>>> * gluster test volume: 586.5 MB/s
>>> * bricks (in .glusterfs): 1.4 GB/s
>>>
>>> The profile for the gluster test-volume is in
>>>
>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>>> Let's start with the same 'dd' test we were testing with to see,
>>>> what the numbers are. Please provide profile numbers for the
>>>> same. From there on we will start tuning the volume to see what
>>>> we can do.
>>>>
>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
>>>> <mailto:***@mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Thanks for the tip. We now have the gluster volume mounted
>>>> under /home. What tests do you recommend we run?
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu
>>>>> <mailto:***@mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Sorry for the delay. I never saw received your reply
>>>>> (but I did receive Ben Turner's follow-up to your
>>>>> reply). So we tried to create a gluster volume under
>>>>> /home using different variations of
>>>>>
>>>>> gluster volume create test-volume
>>>>> mseas-data2:/home/gbrick_test_1
>>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>>
>>>>> However we keep getting errors of the form
>>>>>
>>>>> Wrong brick type: transport, use
>>>>> <HOSTNAME>:<export-dir-abs-path>
>>>>>
>>>>> Any thoughts on what we're doing wrong?
>>>>>
>>>>>
>>>>> You should give transport tcp at the beginning I think.
>>>>> Anyways, transport tcp is the default, so no need to specify
>>>>> so remove those two words from the CLI.
>>>>>
>>>>>
>>>>> Also do you have a list of the test we should be running
>>>>> once we get this volume created? Given the time-zone
>>>>> difference it might help if we can run a small battery
>>>>> of tests and post the results rather than test-post-new
>>>>> test-post... .
>>>>>
>>>>>
>>>>> This is the first time I am doing performance analysis on
>>>>> users as far as I remember. In our team there are separate
>>>>> engineers who do these tests. Ben who replied earlier is one
>>>>> such engineer.
>>>>>
>>>>> Ben,
>>>>> Have any suggestions?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> The /home partition is mounted as ext4
>>>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>>>
>>>>>> The brick partitions are mounted ax xfs
>>>>>> /mnt/brick1 xfs defaults 0 0
>>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>>
>>>>>> Will this cause a problem with creating a volume
>>>>>> under /home?
>>>>>>
>>>>>>
>>>>>> I don't think the bottleneck is disk. You can do the
>>>>>> same tests you did on your new volume to confirm?
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Unfortunately, we don't have similar hardware
>>>>>>> for a small scale test. All we have is our
>>>>>>> production hardware.
>>>>>>>
>>>>>>>
>>>>>>> You said something about /home partition which has
>>>>>>> lesser disks, we can create plain distribute
>>>>>>> volume inside one of those directories. After we
>>>>>>> are done, we can remove the setup. What do you say?
>>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>>>> Karampuri wrote:
>>>>>>>>
>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> Since we are mounting the partitions as
>>>>>>>> the bricks, I tried the dd test writing
>>>>>>>> to
>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>> The results without oflag=sync were 1.6
>>>>>>>> Gb/s (faster than gluster but not as fast
>>>>>>>> as I was expecting given the 1.2 Gb/s to
>>>>>>>> the no-gluster area w/ fewer disks).
>>>>>>>>
>>>>>>>>
>>>>>>>> Okay, then 1.6Gb/s is what we need to target
>>>>>>>> for, considering your volume is just
>>>>>>>> distribute. Is there any way you can do tests
>>>>>>>> on similar hardware but at a small scale?
>>>>>>>> Just so we can run the workload to learn more
>>>>>>>> about the bottlenecks in the system? We can
>>>>>>>> probably try to get the speed to 1.2Gb/s on
>>>>>>>> your /home partition you were telling me
>>>>>>>> yesterday. Let me know if that is something
>>>>>>>> you are okay to do.
>>>>>>>>
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>>>> Karampuri wrote:
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat
>>>>>>>>> Haley <***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Pranith,
>>>>>>>>>
>>>>>>>>> Not entirely sure (this isn't my
>>>>>>>>> area of expertise). I'll run your
>>>>>>>>> answer by some other people who are
>>>>>>>>> more familiar with this.
>>>>>>>>>
>>>>>>>>> I am also uncertain about how to
>>>>>>>>> interpret the results when we also
>>>>>>>>> add the dd tests writing to the
>>>>>>>>> /home area (no gluster, still on the
>>>>>>>>> same machine)
>>>>>>>>>
>>>>>>>>> * dd test without oflag=sync
>>>>>>>>> (rough average of multiple tests)
>>>>>>>>> o gluster w/ fuse mount : 570 Mb/s
>>>>>>>>> o gluster w/ nfs mount: 390 Mb/s
>>>>>>>>> o nfs (no gluster): 1.2 Gb/s
>>>>>>>>> * dd test with oflag=sync (rough
>>>>>>>>> average of multiple tests)
>>>>>>>>> o gluster w/ fuse mount: 5 Mb/s
>>>>>>>>> o gluster w/ nfs mount: 200 Mb/s
>>>>>>>>> o nfs (no gluster): 20 Mb/s
>>>>>>>>>
>>>>>>>>> Given that the non-gluster area is a
>>>>>>>>> RAID-6 of 4 disks while each brick
>>>>>>>>> of the gluster area is a RAID-6 of
>>>>>>>>> 32 disks, I would naively expect the
>>>>>>>>> writes to the gluster area to be
>>>>>>>>> roughly 8x faster than to the
>>>>>>>>> non-gluster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think a better test is to try and
>>>>>>>>> write to a file using nfs without any
>>>>>>>>> gluster to a location that is not inside
>>>>>>>>> the brick but someother location that is
>>>>>>>>> on same disk(s). If you are mounting the
>>>>>>>>> partition as the brick, then we can
>>>>>>>>> write to a file inside .glusterfs
>>>>>>>>> directory, something like
>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I still think we have a speed issue,
>>>>>>>>> I can't tell if fuse vs nfs is part
>>>>>>>>> of the problem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I got interested in the post because I
>>>>>>>>> read that fuse speed is lesser than nfs
>>>>>>>>> speed which is counter-intuitive to my
>>>>>>>>> understanding. So wanted clarifications.
>>>>>>>>> Now that I got my clarifications where
>>>>>>>>> fuse outperformed nfs without sync, we
>>>>>>>>> can resume testing as described above
>>>>>>>>> and try to find what it is. Based on
>>>>>>>>> your email-id I am guessing you are from
>>>>>>>>> Boston and I am from Bangalore so if you
>>>>>>>>> are okay with doing this debugging for
>>>>>>>>> multiple days because of timezones, I
>>>>>>>>> will be happy to help. Please be a bit
>>>>>>>>> patient with me, I am under a release
>>>>>>>>> crunch but I am very curious with the
>>>>>>>>> problem you posted.
>>>>>>>>>
>>>>>>>>> Was there anything useful in the
>>>>>>>>> profiles?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unfortunately profiles didn't help me
>>>>>>>>> much, I think we are collecting the
>>>>>>>>> profiles from an active volume, so it
>>>>>>>>> has a lot of information that is not
>>>>>>>>> pertaining to dd so it is difficult to
>>>>>>>>> find the contributions of dd. So I went
>>>>>>>>> through your post again and found
>>>>>>>>> something I didn't pay much attention to
>>>>>>>>> earlier i.e. oflag=sync, so did my own
>>>>>>>>> tests on my setup with FUSE so sent that
>>>>>>>>> reply.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Pat
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>> Okay good. At least this validates
>>>>>>>>>> my doubts. Handling O_SYNC in
>>>>>>>>>> gluster NFS and fuse is a bit
>>>>>>>>>> different.
>>>>>>>>>> When application opens a file with
>>>>>>>>>> O_SYNC on fuse mount then each
>>>>>>>>>> write syscall has to be written to
>>>>>>>>>> disk as part of the syscall where
>>>>>>>>>> as in case of NFS, there is no
>>>>>>>>>> concept of open. NFS performs write
>>>>>>>>>> though a handle saying it needs to
>>>>>>>>>> be a synchronous write, so write()
>>>>>>>>>> syscall is performed first then it
>>>>>>>>>> performs fsync(). so an write on an
>>>>>>>>>> fd with O_SYNC becomes write+fsync.
>>>>>>>>>> I am suspecting that when multiple
>>>>>>>>>> threads do this write+fsync()
>>>>>>>>>> operation on the same file,
>>>>>>>>>> multiple writes are batched
>>>>>>>>>> together to be written do disk so
>>>>>>>>>> the throughput on the disk is
>>>>>>>>>> increasing is my guess.
>>>>>>>>>>
>>>>>>>>>> Does it answer your doubts?
>>>>>>>>>>
>>>>>>>>>> On Wed, May 10, 2017 at 9:35 PM,
>>>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Without the oflag=sync and only
>>>>>>>>>> a single test of each, the FUSE
>>>>>>>>>> is going faster than NFS:
>>>>>>>>>>
>>>>>>>>>> FUSE:
>>>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>>>>>>> 4096+0 records in
>>>>>>>>>> 4096+0 records out
>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> NFS
>>>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>>>>>>> 4096+0 records in
>>>>>>>>>> 4096+0 records out
>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/10/2017 11:53 AM, Pranith
>>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>>> Could you let me know the
>>>>>>>>>>> speed without oflag=sync on
>>>>>>>>>>> both the mounts? No need to
>>>>>>>>>>> collect profiles.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 10, 2017 at 9:17
>>>>>>>>>>> PM, Pat Haley <***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is what I see now:
>>>>>>>>>>>
>>>>>>>>>>> [***@mseas-data2 ~]#
>>>>>>>>>>> gluster volume info
>>>>>>>>>>>
>>>>>>>>>>> Volume Name: data-volume
>>>>>>>>>>> Type: Distribute
>>>>>>>>>>> Volume ID:
>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>> Status: Started
>>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>> Bricks:
>>>>>>>>>>> Brick1:
>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>> Brick2:
>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>>>> on
>>>>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>>>> WARNING
>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/10/2017 11:44 AM,
>>>>>>>>>>> Pranith Kumar Karampuri wrote:
>>>>>>>>>>>> Is this the volume info
>>>>>>>>>>>> you have?
>>>>>>>>>>>>
>>>>>>>>>>>> >/[root at mseas-data2
>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>>> ~]# gluster volume info
>>>>>>>>>>>> />//>/Volume Name:
>>>>>>>>>>>> data-volume />/Type:
>>>>>>>>>>>> Distribute />/Volume ID:
>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>> />/Status: Started />/Number
>>>>>>>>>>>> of Bricks: 2
>>>>>>>>>>>> />/Transport-type: tcp
>>>>>>>>>>>> />/Bricks: />/Brick1:
>>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>>> />/Brick2:
>>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>>> />/Options Reconfigured:
>>>>>>>>>>>> />/performance.readdir-ahead:
>>>>>>>>>>>> on />/nfs.disable: on
>>>>>>>>>>>> />/nfs.export-volumes: off /
>>>>>>>>>>>> ​I copied this from old
>>>>>>>>>>>> thread from 2016. This is
>>>>>>>>>>>> distribute volume. Did
>>>>>>>>>>>> you change any of the
>>>>>>>>>>>> options in between?
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>> Pat Haley
>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>> Center for Ocean Engineering
>>>>>>>>>>> Phone: (617) 253-6824
>>>>>>>>>>> Dept. of Mechanical Engineering
>>>>>>>>>>> Fax: (617) 253-8125
>>>>>>>>>>> MIT, Room
>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Pranith
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>> Pat Haley
>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>> Center for Ocean Engineering
>>>>>>>>>> Phone: (617) 253-6824
>>>>>>>>>> Dept. of Mechanical Engineering
>>>>>>>>>> Fax: (617) 253-8125
>>>>>>>>>> MIT, Room
>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pranith
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley
>>>>>>>>> Email:***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>> Center for Ocean Engineering Phone:
>>>>>>>>> (617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax:
>>>>>>>>> (617) 253-8125
>>>>>>>>> MIT, Room
>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley
>>>>>>>> Email:***@mit.edu
>>>>>>>> <mailto:***@mit.edu>
>>>>>>>> Center for Ocean Engineering Phone:
>>>>>>>> (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax:
>>>>>>>> (617) 253-8125
>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley
>>>>>>> Email:***@mit.edu
>>>>>>> <mailto:***@mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617)
>>>>>>> 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617)
>>>>>>> 253-8125
>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:***@mit.edu
>>>>>> <mailto:***@mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617)
>>>>>> 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617)
>>>>>> 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:***@mit.edu
>>>>> <mailto:***@mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:***@mit.edu
>>>> <mailto:***@mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:***@mit.edu
>>> <mailto:***@mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Ben Turner
2017-06-12 20:28:09 UTC
Reply
Permalink
Raw Message
----- Original Message -----
> From: "Pat Haley" <***@mit.edu>
> To: "Ben Turner" <***@redhat.com>, "Pranith Kumar Karampuri" <***@redhat.com>
> Cc: "Ravishankar N" <***@redhat.com>, gluster-***@gluster.org, "Steve Postma" <***@ztechnet.com>
> Sent: Monday, June 12, 2017 2:35:41 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
>
>
> Hi Guys,
>
> I was wondering what our next steps should be to solve the slow write times.
>
> Recently I was debugging a large code and writing a lot of output at
> every time step. When I tried writing to our gluster disks, it was
> taking over a day to do a single time step whereas if I had the same
> program (same hardware, network) write to our nfs disk the time per
> time-step was about 45 minutes. What we are shooting for here would be
> to have similar times to either gluster of nfs.

I can see in your test:

http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt

You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / sec} / #replicas{2} = 600). Gluster does client side replication so with replica 2 you will only ever see 1/2 the speed of your slowest part of the stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is normally a best case. Now in your output I do see the instances where you went down to 200 MB / sec. I can only explain this in three ways:

1. You are not using conv=fdatasync and writes are actually going to page cache and then being flushed to disk. During the fsync the memory is not yet available and the disks are busy flushing dirty pages.
2. Your storage RAID group is shared across multiple LUNS(like in a SAN) and when write times are slow the RAID group is busy serviceing other LUNs.
3. Gluster bug / config issue / some other unknown unknown.

So I see 2 issues here:

1. NFS does in 45 minutes what gluster can do in 24 hours.
2. Sometimes your throughput drops dramatically.

WRT #1 - have a look at my estimates above. My formula for guestimating gluster perf is: throughput = NIC throughput or storage(whatever is slower) / # replicas * overhead(figure .7 or .8). Also the larger the record size the better for glusterfs mounts, I normally like to be at LEAST 64k up to 1024k:

# dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 conv=fdatasync

WRT #2 - Again, I question your testing and your storage config. Try using conv=fdatasync for your DDs, use a larger record size, and make sure that your back end storage is not causing your slowdowns. Also remember that with replica 2 you will take ~50% hit on writes because the client uses 50% of its bandwidth to write to one replica and 50% to the other.

-b



>
> Thanks
>
> Pat
>
>
> On 06/02/2017 01:07 AM, Ben Turner wrote:
> > Are you sure using conv=sync is what you want? I normally use
> > conv=fdatasync, I'll look up the difference between the two and see if it
> > affects your test.
> >
> >
> > -b
> >
> > ----- Original Message -----
> >> From: "Pat Haley" <***@mit.edu>
> >> To: "Pranith Kumar Karampuri" <***@redhat.com>
> >> Cc: "Ravishankar N" <***@redhat.com>, gluster-***@gluster.org,
> >> "Steve Postma" <***@ztechnet.com>, "Ben
> >> Turner" <***@redhat.com>
> >> Sent: Tuesday, May 30, 2017 9:40:34 PM
> >> Subject: Re: [Gluster-users] Slow write times to gluster disk
> >>
> >>
> >> Hi Pranith,
> >>
> >> The "dd" command was:
> >>
> >> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
> >>
> >> There were 2 instances where dd reported 22 seconds. The output from the
> >> dd tests are in
> >>
> >> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>
> >> Pat
> >>
> >> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> >>> Pat,
> >>> What is the command you used? As per the following output, it
> >>> seems like at least one write operation took 16 seconds. Which is
> >>> really bad.
> >>> 96.39 1165.10 us 89.00 us*16487014.00 us* 393212
> >>> WRITE
> >>>
> >>>
> >>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
> >>> <mailto:***@mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> I ran the same 'dd' test both in the gluster test volume and in
> >>> the .glusterfs directory of each brick. The median results (12 dd
> >>> trials in each test) are similar to before
> >>>
> >>> * gluster test volume: 586.5 MB/s
> >>> * bricks (in .glusterfs): 1.4 GB/s
> >>>
> >>> The profile for the gluster test-volume is in
> >>>
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> >>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>>
> >>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> >>>> Let's start with the same 'dd' test we were testing with to see,
> >>>> what the numbers are. Please provide profile numbers for the
> >>>> same. From there on we will start tuning the volume to see what
> >>>> we can do.
> >>>>
> >>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
> >>>> <mailto:***@mit.edu>> wrote:
> >>>>
> >>>>
> >>>> Hi Pranith,
> >>>>
> >>>> Thanks for the tip. We now have the gluster volume mounted
> >>>> under /home. What tests do you recommend we run?
> >>>>
> >>>> Thanks
> >>>>
> >>>> Pat
> >>>>
> >>>>
> >>>>
> >>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
> >>>>>
> >>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <***@mit.edu
> >>>>> <mailto:***@mit.edu>> wrote:
> >>>>>
> >>>>>
> >>>>> Hi Pranith,
> >>>>>
> >>>>> Sorry for the delay. I never saw received your reply
> >>>>> (but I did receive Ben Turner's follow-up to your
> >>>>> reply). So we tried to create a gluster volume under
> >>>>> /home using different variations of
> >>>>>
> >>>>> gluster volume create test-volume
> >>>>> mseas-data2:/home/gbrick_test_1
> >>>>> mseas-data2:/home/gbrick_test_2 transport tcp
> >>>>>
> >>>>> However we keep getting errors of the form
> >>>>>
> >>>>> Wrong brick type: transport, use
> >>>>> <HOSTNAME>:<export-dir-abs-path>
> >>>>>
> >>>>> Any thoughts on what we're doing wrong?
> >>>>>
> >>>>>
> >>>>> You should give transport tcp at the beginning I think.
> >>>>> Anyways, transport tcp is the default, so no need to specify
> >>>>> so remove those two words from the CLI.
> >>>>>
> >>>>>
> >>>>> Also do you have a list of the test we should be running
> >>>>> once we get this volume created? Given the time-zone
> >>>>> difference it might help if we can run a small battery
> >>>>> of tests and post the results rather than test-post-new
> >>>>> test-post... .
> >>>>>
> >>>>>
> >>>>> This is the first time I am doing performance analysis on
> >>>>> users as far as I remember. In our team there are separate
> >>>>> engineers who do these tests. Ben who replied earlier is one
> >>>>> such engineer.
> >>>>>
> >>>>> Ben,
> >>>>> Have any suggestions?
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> Pat
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
> >>>>>>
> >>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
> >>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi Pranith,
> >>>>>>
> >>>>>> The /home partition is mounted as ext4
> >>>>>> /home ext4 defaults,usrquota,grpquota 1 2
> >>>>>>
> >>>>>> The brick partitions are mounted ax xfs
> >>>>>> /mnt/brick1 xfs defaults 0 0
> >>>>>> /mnt/brick2 xfs defaults 0 0
> >>>>>>
> >>>>>> Will this cause a problem with creating a volume
> >>>>>> under /home?
> >>>>>>
> >>>>>>
> >>>>>> I don't think the bottleneck is disk. You can do the
> >>>>>> same tests you did on your new volume to confirm?
> >>>>>>
> >>>>>>
> >>>>>> Pat
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
> >>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi Pranith,
> >>>>>>>
> >>>>>>> Unfortunately, we don't have similar hardware
> >>>>>>> for a small scale test. All we have is our
> >>>>>>> production hardware.
> >>>>>>>
> >>>>>>>
> >>>>>>> You said something about /home partition which has
> >>>>>>> lesser disks, we can create plain distribute
> >>>>>>> volume inside one of those directories. After we
> >>>>>>> are done, we can remove the setup. What do you say?
> >>>>>>>
> >>>>>>>
> >>>>>>> Pat
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
> >>>>>>> Karampuri wrote:
> >>>>>>>>
> >>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
> >>>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Pranith,
> >>>>>>>>
> >>>>>>>> Since we are mounting the partitions as
> >>>>>>>> the bricks, I tried the dd test writing
> >>>>>>>> to
> >>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>>>>>>> The results without oflag=sync were 1.6
> >>>>>>>> Gb/s (faster than gluster but not as fast
> >>>>>>>> as I was expecting given the 1.2 Gb/s to
> >>>>>>>> the no-gluster area w/ fewer disks).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Okay, then 1.6Gb/s is what we need to target
> >>>>>>>> for, considering your volume is just
> >>>>>>>> distribute. Is there any way you can do tests
> >>>>>>>> on similar hardware but at a small scale?
> >>>>>>>> Just so we can run the workload to learn more
> >>>>>>>> about the bottlenecks in the system? We can
> >>>>>>>> probably try to get the speed to 1.2Gb/s on
> >>>>>>>> your /home partition you were telling me
> >>>>>>>> yesterday. Let me know if that is something
> >>>>>>>> you are okay to do.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Pat
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
> >>>>>>>> Karampuri wrote:
> >>>>>>>>>
> >>>>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat
> >>>>>>>>> Haley <***@mit.edu
> >>>>>>>>> <mailto:***@mit.edu>> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi Pranith,
> >>>>>>>>>
> >>>>>>>>> Not entirely sure (this isn't my
> >>>>>>>>> area of expertise). I'll run your
> >>>>>>>>> answer by some other people who are
> >>>>>>>>> more familiar with this.
> >>>>>>>>>
> >>>>>>>>> I am also uncertain about how to
> >>>>>>>>> interpret the results when we also
> >>>>>>>>> add the dd tests writing to the
> >>>>>>>>> /home area (no gluster, still on the
> >>>>>>>>> same machine)
> >>>>>>>>>
> >>>>>>>>> * dd test without oflag=sync
> >>>>>>>>> (rough average of multiple tests)
> >>>>>>>>> o gluster w/ fuse mount : 570
> >>>>>>>>> Mb/s
> >>>>>>>>> o gluster w/ nfs mount: 390 Mb/s
> >>>>>>>>> o nfs (no gluster): 1.2 Gb/s
> >>>>>>>>> * dd test with oflag=sync (rough
> >>>>>>>>> average of multiple tests)
> >>>>>>>>> o gluster w/ fuse mount: 5 Mb/s
> >>>>>>>>> o gluster w/ nfs mount: 200 Mb/s
> >>>>>>>>> o nfs (no gluster): 20 Mb/s
> >>>>>>>>>
> >>>>>>>>> Given that the non-gluster area is a
> >>>>>>>>> RAID-6 of 4 disks while each brick
> >>>>>>>>> of the gluster area is a RAID-6 of
> >>>>>>>>> 32 disks, I would naively expect the
> >>>>>>>>> writes to the gluster area to be
> >>>>>>>>> roughly 8x faster than to the
> >>>>>>>>> non-gluster.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I think a better test is to try and
> >>>>>>>>> write to a file using nfs without any
> >>>>>>>>> gluster to a location that is not inside
> >>>>>>>>> the brick but someother location that is
> >>>>>>>>> on same disk(s). If you are mounting the
> >>>>>>>>> partition as the brick, then we can
> >>>>>>>>> write to a file inside .glusterfs
> >>>>>>>>> directory, something like
> >>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I still think we have a speed issue,
> >>>>>>>>> I can't tell if fuse vs nfs is part
> >>>>>>>>> of the problem.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I got interested in the post because I
> >>>>>>>>> read that fuse speed is lesser than nfs
> >>>>>>>>> speed which is counter-intuitive to my
> >>>>>>>>> understanding. So wanted clarifications.
> >>>>>>>>> Now that I got my clarifications where
> >>>>>>>>> fuse outperformed nfs without sync, we
> >>>>>>>>> can resume testing as described above
> >>>>>>>>> and try to find what it is. Based on
> >>>>>>>>> your email-id I am guessing you are from
> >>>>>>>>> Boston and I am from Bangalore so if you
> >>>>>>>>> are okay with doing this debugging for
> >>>>>>>>> multiple days because of timezones, I
> >>>>>>>>> will be happy to help. Please be a bit
> >>>>>>>>> patient with me, I am under a release
> >>>>>>>>> crunch but I am very curious with the
> >>>>>>>>> problem you posted.
> >>>>>>>>>
> >>>>>>>>> Was there anything useful in the
> >>>>>>>>> profiles?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Unfortunately profiles didn't help me
> >>>>>>>>> much, I think we are collecting the
> >>>>>>>>> profiles from an active volume, so it
> >>>>>>>>> has a lot of information that is not
> >>>>>>>>> pertaining to dd so it is difficult to
> >>>>>>>>> find the contributions of dd. So I went
> >>>>>>>>> through your post again and found
> >>>>>>>>> something I didn't pay much attention to
> >>>>>>>>> earlier i.e. oflag=sync, so did my own
> >>>>>>>>> tests on my setup with FUSE so sent that
> >>>>>>>>> reply.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Pat
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 05/10/2017 12:15 PM, Pranith
> >>>>>>>>> Kumar Karampuri wrote:
> >>>>>>>>>> Okay good. At least this validates
> >>>>>>>>>> my doubts. Handling O_SYNC in
> >>>>>>>>>> gluster NFS and fuse is a bit
> >>>>>>>>>> different.
> >>>>>>>>>> When application opens a file with
> >>>>>>>>>> O_SYNC on fuse mount then each
> >>>>>>>>>> write syscall has to be written to
> >>>>>>>>>> disk as part of the syscall where
> >>>>>>>>>> as in case of NFS, there is no
> >>>>>>>>>> concept of open. NFS performs write
> >>>>>>>>>> though a handle saying it needs to
> >>>>>>>>>> be a synchronous write, so write()
> >>>>>>>>>> syscall is performed first then it
> >>>>>>>>>> performs fsync(). so an write on an
> >>>>>>>>>> fd with O_SYNC becomes write+fsync.
> >>>>>>>>>> I am suspecting that when multiple
> >>>>>>>>>> threads do this write+fsync()
> >>>>>>>>>> operation on the same file,
> >>>>>>>>>> multiple writes are batched
> >>>>>>>>>> together to be written do disk so
> >>>>>>>>>> the throughput on the disk is
> >>>>>>>>>> increasing is my guess.
> >>>>>>>>>>
> >>>>>>>>>> Does it answer your doubts?
> >>>>>>>>>>
> >>>>>>>>>> On Wed, May 10, 2017 at 9:35 PM,
> >>>>>>>>>> Pat Haley <***@mit.edu
> >>>>>>>>>> <mailto:***@mit.edu>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Without the oflag=sync and only
> >>>>>>>>>> a single test of each, the FUSE
> >>>>>>>>>> is going faster than NFS:
> >>>>>>>>>>
> >>>>>>>>>> FUSE:
> >>>>>>>>>> mseas-data2(dri_nascar)% dd
> >>>>>>>>>> if=/dev/zero count=4096
> >>>>>>>>>> bs=1048576 of=zeros.txt conv=sync
> >>>>>>>>>> 4096+0 records in
> >>>>>>>>>> 4096+0 records out
> >>>>>>>>>> 4294967296 bytes (4.3 GB)
> >>>>>>>>>> copied, 7.46961 s, 575 MB/s
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> NFS
> >>>>>>>>>> mseas-data2(HYCOM)% dd
> >>>>>>>>>> if=/dev/zero count=4096
> >>>>>>>>>> bs=1048576 of=zeros.txt conv=sync
> >>>>>>>>>> 4096+0 records in
> >>>>>>>>>> 4096+0 records out
> >>>>>>>>>> 4294967296 bytes (4.3 GB)
> >>>>>>>>>> copied, 11.4264 s, 376 MB/s
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 05/10/2017 11:53 AM, Pranith
> >>>>>>>>>> Kumar Karampuri wrote:
> >>>>>>>>>>> Could you let me know the
> >>>>>>>>>>> speed without oflag=sync on
> >>>>>>>>>>> both the mounts? No need to
> >>>>>>>>>>> collect profiles.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, May 10, 2017 at 9:17
> >>>>>>>>>>> PM, Pat Haley <***@mit.edu
> >>>>>>>>>>> <mailto:***@mit.edu>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Here is what I see now:
> >>>>>>>>>>>
> >>>>>>>>>>> [***@mseas-data2 ~]#
> >>>>>>>>>>> gluster volume info
> >>>>>>>>>>>
> >>>>>>>>>>> Volume Name: data-volume
> >>>>>>>>>>> Type: Distribute
> >>>>>>>>>>> Volume ID:
> >>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>>>>>>>>>> Status: Started
> >>>>>>>>>>> Number of Bricks: 2
> >>>>>>>>>>> Transport-type: tcp
> >>>>>>>>>>> Bricks:
> >>>>>>>>>>> Brick1:
> >>>>>>>>>>> mseas-data2:/mnt/brick1
> >>>>>>>>>>> Brick2:
> >>>>>>>>>>> mseas-data2:/mnt/brick2
> >>>>>>>>>>> Options Reconfigured:
> >>>>>>>>>>> diagnostics.count-fop-hits:
> >>>>>>>>>>> on
> >>>>>>>>>>> diagnostics.latency-measurement:
> >>>>>>>>>>> on
> >>>>>>>>>>> nfs.exports-auth-enable: on
> >>>>>>>>>>> diagnostics.brick-sys-log-level:
> >>>>>>>>>>> WARNING
> >>>>>>>>>>> performance.readdir-ahead:
> >>>>>>>>>>> on
> >>>>>>>>>>> nfs.disable: on
> >>>>>>>>>>> nfs.export-volumes: off
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 05/10/2017 11:44 AM,
> >>>>>>>>>>> Pranith Kumar Karampuri
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> Is this the volume info
> >>>>>>>>>>>> you have?
> >>>>>>>>>>>>
> >>>>>>>>>>>> >/[root at mseas-data2
> >>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
> >>>>>>>>>>>> ~]# gluster volume info
> >>>>>>>>>>>> />//>/Volume Name:
> >>>>>>>>>>>> data-volume />/Type:
> >>>>>>>>>>>> Distribute />/Volume ID:
> >>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>>>>>>>>>>> />/Status: Started
> >>>>>>>>>>>> />/Number
> >>>>>>>>>>>> of Bricks: 2
> >>>>>>>>>>>> />/Transport-type: tcp
> >>>>>>>>>>>> />/Bricks: />/Brick1:
> >>>>>>>>>>>> mseas-data2:/mnt/brick1
> >>>>>>>>>>>> />/Brick2:
> >>>>>>>>>>>> mseas-data2:/mnt/brick2
> >>>>>>>>>>>> />/Options Reconfigured:
> >>>>>>>>>>>> />/performance.readdir-ahead:
> >>>>>>>>>>>> on />/nfs.disable: on
> >>>>>>>>>>>> />/nfs.export-volumes: off
> >>>>>>>>>>>> /
> >>>>>>>>>>>> ​I copied this from old
> >>>>>>>>>>>> thread from 2016. This is
> >>>>>>>>>>>> distribute volume. Did
> >>>>>>>>>>>> you change any of the
> >>>>>>>>>>>> options in between?
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>>>>>> Pat Haley
> >>>>>>>>>>> Email:***@mit.edu
> >>>>>>>>>>> <mailto:***@mit.edu>
> >>>>>>>>>>> Center for Ocean Engineering
> >>>>>>>>>>> Phone: (617) 253-6824
> >>>>>>>>>>> Dept. of Mechanical
> >>>>>>>>>>> Engineering
> >>>>>>>>>>> Fax: (617) 253-8125
> >>>>>>>>>>> MIT, Room
> >>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
> >>>>>>>>>>> 77 Massachusetts Avenue
> >>>>>>>>>>> Cambridge, MA 02139-4301
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Pranith
> >>>>>>>>>> --
> >>>>>>>>>>
> >>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>>>>> Pat Haley
> >>>>>>>>>> Email:***@mit.edu
> >>>>>>>>>> <mailto:***@mit.edu>
> >>>>>>>>>> Center for Ocean Engineering
> >>>>>>>>>> Phone: (617) 253-6824
> >>>>>>>>>> Dept. of Mechanical Engineering
> >>>>>>>>>> Fax: (617) 253-8125
> >>>>>>>>>> MIT, Room
> >>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
> >>>>>>>>>> 77 Massachusetts Avenue
> >>>>>>>>>> Cambridge, MA 02139-4301
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Pranith
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>>>> Pat Haley
> >>>>>>>>> Email:***@mit.edu
> >>>>>>>>> <mailto:***@mit.edu>
> >>>>>>>>> Center for Ocean Engineering
> >>>>>>>>> Phone:
> >>>>>>>>> (617) 253-6824
> >>>>>>>>> Dept. of Mechanical Engineering
> >>>>>>>>> Fax:
> >>>>>>>>> (617) 253-8125
> >>>>>>>>> MIT, Room
> >>>>>>>>> 5-213http://web.mit.edu/phaley/www/
> >>>>>>>>> 77 Massachusetts Avenue
> >>>>>>>>> Cambridge, MA 02139-4301
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Pranith
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>>> Pat Haley
> >>>>>>>> Email:***@mit.edu
> >>>>>>>> <mailto:***@mit.edu>
> >>>>>>>> Center for Ocean Engineering Phone:
> >>>>>>>> (617) 253-6824
> >>>>>>>> Dept. of Mechanical Engineering Fax:
> >>>>>>>> (617) 253-8125
> >>>>>>>> MIT, Room
> >>>>>>>> 5-213http://web.mit.edu/phaley/www/
> >>>>>>>> 77 Massachusetts Avenue
> >>>>>>>> Cambridge, MA 02139-4301
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Pranith
> >>>>>>> --
> >>>>>>>
> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>>> Pat Haley
> >>>>>>> Email:***@mit.edu
> >>>>>>> <mailto:***@mit.edu>
> >>>>>>> Center for Ocean Engineering Phone: (617)
> >>>>>>> 253-6824
> >>>>>>> Dept. of Mechanical Engineering Fax: (617)
> >>>>>>> 253-8125
> >>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>>>>> 77 Massachusetts Avenue
> >>>>>>> Cambridge, MA 02139-4301
> >>>>>>>
> >>>>>>> --
> >>>>>>> Pranith
> >>>>>> --
> >>>>>>
> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>>> Pat Haley
> >>>>>> Email:***@mit.edu
> >>>>>> <mailto:***@mit.edu>
> >>>>>> Center for Ocean Engineering Phone: (617)
> >>>>>> 253-6824
> >>>>>> Dept. of Mechanical Engineering Fax: (617)
> >>>>>> 253-8125
> >>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>>>> 77 Massachusetts Avenue
> >>>>>> Cambridge, MA 02139-4301
> >>>>>>
> >>>>>> --
> >>>>>> Pranith
> >>>>> --
> >>>>>
> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>>> Pat Haley Email:***@mit.edu
> >>>>> <mailto:***@mit.edu>
> >>>>> Center for Ocean Engineering Phone: (617) 253-6824
> >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>>> 77 Massachusetts Avenue
> >>>>> Cambridge, MA 02139-4301
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Pranith
> >>>> --
> >>>>
> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>> Pat Haley Email:***@mit.edu
> >>>> <mailto:***@mit.edu>
> >>>> Center for Ocean Engineering Phone: (617) 253-6824
> >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>> 77 Massachusetts Avenue
> >>>> Cambridge, MA 02139-4301
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Pranith
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email:***@mit.edu
> >>> <mailto:***@mit.edu>
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >> --
> >>
> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >> Pat Haley Email: ***@mit.edu
> >> Center for Ocean Engineering Phone: (617) 253-6824
> >> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> >> 77 Massachusetts Avenue
> >> Cambridge, MA 02139-4301
> >>
> >>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: ***@mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
Pat Haley
2017-06-20 16:06:30 UTC
Reply
Permalink
Raw Message
Hi Ben,

Sorry this took so long, but we had a real-time forecasting exercise
last week and I could only get to this now.

Backend Hardware/OS:

* Much of the information on our back end system is included at the
top of
http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html
* The specific model of the hard disks is SeaGate ENTERPRISE CAPACITY
V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s.
* Note: there is one physical server that hosts both the NFS and the
GlusterFS areas

Latest tests

I have had time to run the tests for one of the dd tests you requested
to the underlying XFS FS. The median rate was 170 MB/s. The dd results
and iostat record are in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/

I'll add tests for the other brick and to the NFS area later.

Thanks

Pat


On 06/12/2017 06:06 PM, Ben Turner wrote:
> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use:
>
> throughput = slowest of disks / NIC * .6-.7
>
> In your case we have:
>
> 1200 * .6 = 720
>
> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is:
>
> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using?
>
> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct?
>
> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run:
>
> If you are focusing on a write workload run:
>
> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync
>
> If you are focusing on a read workload run:
>
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000
>
> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! **
>
> Run this in a loop similar to how you did in:
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>
> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me:
>
> # iostat -c -m -x 1 > iostat-$(hostname).txt
>
> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster.
>
> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here.
>
> -b
>
>
>
> ----- Original Message -----
>> From: "Pat Haley" <***@mit.edu>
>> To: "Ben Turner" <***@redhat.com>
>> Sent: Monday, June 12, 2017 5:18:07 PM
>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>
>>
>> Hi Ben,
>>
>> Here is the output:
>>
>> [***@mseas-data2 ~]# gluster volume info
>>
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Options Reconfigured:
>> nfs.exports-auth-enable: on
>> diagnostics.brick-sys-log-level: WARNING
>> performance.readdir-ahead: on
>> nfs.disable: on
>> nfs.export-volumes: off
>>
>>
>> On 06/12/2017 05:01 PM, Ben Turner wrote:
>>> What is the output of gluster v info? That will tell us more about your
>>> config.
>>>
>>> -b
>>>
>>> ----- Original Message -----
>>>> From: "Pat Haley" <***@mit.edu>
>>>> To: "Ben Turner" <***@redhat.com>
>>>> Sent: Monday, June 12, 2017 4:54:00 PM
>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>>
>>>>
>>>> Hi Ben,
>>>>
>>>> I guess I'm confused about what you mean by replication. If I look at
>>>> the underlying bricks I only ever have a single copy of any file. It
>>>> either resides on one brick or the other (directories exist on both
>>>> bricks but not files). We are not using gluster for redundancy (or at
>>>> least that wasn't our intent). Is that what you meant by replication
>>>> or is it something else?
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>> On 06/12/2017 04:28 PM, Ben Turner wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Pat Haley" <***@mit.edu>
>>>>>> To: "Ben Turner" <***@redhat.com>, "Pranith Kumar Karampuri"
>>>>>> <***@redhat.com>
>>>>>> Cc: "Ravishankar N" <***@redhat.com>, gluster-***@gluster.org,
>>>>>> "Steve Postma" <***@ztechnet.com>
>>>>>> Sent: Monday, June 12, 2017 2:35:41 PM
>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>>>>
>>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> I was wondering what our next steps should be to solve the slow write
>>>>>> times.
>>>>>>
>>>>>> Recently I was debugging a large code and writing a lot of output at
>>>>>> every time step. When I tried writing to our gluster disks, it was
>>>>>> taking over a day to do a single time step whereas if I had the same
>>>>>> program (same hardware, network) write to our nfs disk the time per
>>>>>> time-step was about 45 minutes. What we are shooting for here would be
>>>>>> to have similar times to either gluster of nfs.
>>>>> I can see in your test:
>>>>>
>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>>>
>>>>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB /
>>>>> sec} / #replicas{2} = 600). Gluster does client side replication so with
>>>>> replica 2 you will only ever see 1/2 the speed of your slowest part of
>>>>> the
>>>>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is
>>>>> normally
>>>>> a best case. Now in your output I do see the instances where you went
>>>>> down to 200 MB / sec. I can only explain this in three ways:
>>>>>
>>>>> 1. You are not using conv=fdatasync and writes are actually going to
>>>>> page
>>>>> cache and then being flushed to disk. During the fsync the memory is not
>>>>> yet available and the disks are busy flushing dirty pages.
>>>>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN)
>>>>> and when write times are slow the RAID group is busy serviceing other
>>>>> LUNs.
>>>>> 3. Gluster bug / config issue / some other unknown unknown.
>>>>>
>>>>> So I see 2 issues here:
>>>>>
>>>>> 1. NFS does in 45 minutes what gluster can do in 24 hours.
>>>>> 2. Sometimes your throughput drops dramatically.
>>>>>
>>>>> WRT #1 - have a look at my estimates above. My formula for guestimating
>>>>> gluster perf is: throughput = NIC throughput or storage(whatever is
>>>>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the
>>>>> record size the better for glusterfs mounts, I normally like to be at
>>>>> LEAST 64k up to 1024k:
>>>>>
>>>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000
>>>>> conv=fdatasync
>>>>>
>>>>> WRT #2 - Again, I question your testing and your storage config. Try
>>>>> using
>>>>> conv=fdatasync for your DDs, use a larger record size, and make sure that
>>>>> your back end storage is not causing your slowdowns. Also remember that
>>>>> with replica 2 you will take ~50% hit on writes because the client uses
>>>>> 50% of its bandwidth to write to one replica and 50% to the other.
>>>>>
>>>>> -b
>>>>>
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>> On 06/02/2017 01:07 AM, Ben Turner wrote:
>>>>>>> Are you sure using conv=sync is what you want? I normally use
>>>>>>> conv=fdatasync, I'll look up the difference between the two and see if
>>>>>>> it
>>>>>>> affects your test.
>>>>>>>
>>>>>>>
>>>>>>> -b
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Pat Haley" <***@mit.edu>
>>>>>>>> To: "Pranith Kumar Karampuri" <***@redhat.com>
>>>>>>>> Cc: "Ravishankar N" <***@redhat.com>,
>>>>>>>> gluster-***@gluster.org,
>>>>>>>> "Steve Postma" <***@ztechnet.com>, "Ben
>>>>>>>> Turner" <***@redhat.com>
>>>>>>>> Sent: Tuesday, May 30, 2017 9:40:34 PM
>>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> The "dd" command was:
>>>>>>>>
>>>>>>>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>>>>
>>>>>>>> There were 2 instances where dd reported 22 seconds. The output from
>>>>>>>> the
>>>>>>>> dd tests are in
>>>>>>>>
>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>>>>>>>>> Pat,
>>>>>>>>> What is the command you used? As per the following output,
>>>>>>>>> it
>>>>>>>>> seems like at least one write operation took 16 seconds. Which is
>>>>>>>>> really bad.
>>>>>>>>> 96.39 1165.10 us 89.00 us*16487014.00 us*
>>>>>>>>> 393212
>>>>>>>>> WRITE
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Pranith,
>>>>>>>>>
>>>>>>>>> I ran the same 'dd' test both in the gluster test volume and
>>>>>>>>> in
>>>>>>>>> the .glusterfs directory of each brick. The median results
>>>>>>>>> (12
>>>>>>>>> dd
>>>>>>>>> trials in each test) are similar to before
>>>>>>>>>
>>>>>>>>> * gluster test volume: 586.5 MB/s
>>>>>>>>> * bricks (in .glusterfs): 1.4 GB/s
>>>>>>>>>
>>>>>>>>> The profile for the gluster test-volume is in
>>>>>>>>>
>>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Pat
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>>>>>>>>> Let's start with the same 'dd' test we were testing with to
>>>>>>>>>> see,
>>>>>>>>>> what the numbers are. Please provide profile numbers for the
>>>>>>>>>> same. From there on we will start tuning the volume to see
>>>>>>>>>> what
>>>>>>>>>> we can do.
>>>>>>>>>>
>>>>>>>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Pranith,
>>>>>>>>>>
>>>>>>>>>> Thanks for the tip. We now have the gluster volume
>>>>>>>>>> mounted
>>>>>>>>>> under /home. What tests do you recommend we run?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Pat
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley
>>>>>>>>>>> <***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the delay. I never saw received your
>>>>>>>>>>> reply
>>>>>>>>>>> (but I did receive Ben Turner's follow-up to your
>>>>>>>>>>> reply). So we tried to create a gluster volume
>>>>>>>>>>> under
>>>>>>>>>>> /home using different variations of
>>>>>>>>>>>
>>>>>>>>>>> gluster volume create test-volume
>>>>>>>>>>> mseas-data2:/home/gbrick_test_1
>>>>>>>>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>>>>>>>>
>>>>>>>>>>> However we keep getting errors of the form
>>>>>>>>>>>
>>>>>>>>>>> Wrong brick type: transport, use
>>>>>>>>>>> <HOSTNAME>:<export-dir-abs-path>
>>>>>>>>>>>
>>>>>>>>>>> Any thoughts on what we're doing wrong?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You should give transport tcp at the beginning I think.
>>>>>>>>>>> Anyways, transport tcp is the default, so no need to
>>>>>>>>>>> specify
>>>>>>>>>>> so remove those two words from the CLI.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also do you have a list of the test we should be
>>>>>>>>>>> running
>>>>>>>>>>> once we get this volume created? Given the
>>>>>>>>>>> time-zone
>>>>>>>>>>> difference it might help if we can run a small
>>>>>>>>>>> battery
>>>>>>>>>>> of tests and post the results rather than
>>>>>>>>>>> test-post-new
>>>>>>>>>>> test-post... .
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is the first time I am doing performance analysis
>>>>>>>>>>> on
>>>>>>>>>>> users as far as I remember. In our team there are
>>>>>>>>>>> separate
>>>>>>>>>>> engineers who do these tests. Ben who replied earlier is
>>>>>>>>>>> one
>>>>>>>>>>> such engineer.
>>>>>>>>>>>
>>>>>>>>>>> Ben,
>>>>>>>>>>> Have any suggestions?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Pat
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>>>>>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>
>>>>>>>>>>>> The /home partition is mounted as ext4
>>>>>>>>>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>>>>>>>>>
>>>>>>>>>>>> The brick partitions are mounted ax xfs
>>>>>>>>>>>> /mnt/brick1 xfs defaults 0 0
>>>>>>>>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>>>>>>>>
>>>>>>>>>>>> Will this cause a problem with creating a
>>>>>>>>>>>> volume
>>>>>>>>>>>> under /home?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think the bottleneck is disk. You can do
>>>>>>>>>>>> the
>>>>>>>>>>>> same tests you did on your new volume to confirm?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Pat
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>>>>>>>>>> <***@mit.edu <mailto:***@mit.edu>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, we don't have similar
>>>>>>>>>>>>> hardware
>>>>>>>>>>>>> for a small scale test. All we have is
>>>>>>>>>>>>> our
>>>>>>>>>>>>> production hardware.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You said something about /home partition which
>>>>>>>>>>>>> has
>>>>>>>>>>>>> lesser disks, we can create plain distribute
>>>>>>>>>>>>> volume inside one of those directories. After
>>>>>>>>>>>>> we
>>>>>>>>>>>>> are done, we can remove the setup. What do you
>>>>>>>>>>>>> say?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>>>>>>>>>> Karampuri wrote:
>>>>>>>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat
>>>>>>>>>>>>>> Haley
>>>>>>>>>>>>>> <***@mit.edu <mailto:***@mit.edu>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since we are mounting the partitions
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>> the bricks, I tried the dd test
>>>>>>>>>>>>>> writing
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>>>>>> The results without oflag=sync were
>>>>>>>>>>>>>> 1.6
>>>>>>>>>>>>>> Gb/s (faster than gluster but not as
>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>> as I was expecting given the 1.2 Gb/s
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the no-gluster area w/ fewer disks).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Okay, then 1.6Gb/s is what we need to
>>>>>>>>>>>>>> target
>>>>>>>>>>>>>> for, considering your volume is just
>>>>>>>>>>>>>> distribute. Is there any way you can do
>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>> on similar hardware but at a small scale?
>>>>>>>>>>>>>> Just so we can run the workload to learn
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>> about the bottlenecks in the system? We
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> probably try to get the speed to 1.2Gb/s
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> your /home partition you were telling me
>>>>>>>>>>>>>> yesterday. Let me know if that is
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> you are okay to do.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>>>>>>>>>> Karampuri wrote:
>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM,
>>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>> Haley <***@mit.edu
>>>>>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Not entirely sure (this isn't my
>>>>>>>>>>>>>>> area of expertise). I'll run
>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>> answer by some other people who
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> more familiar with this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am also uncertain about how to
>>>>>>>>>>>>>>> interpret the results when we
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> add the dd tests writing to the
>>>>>>>>>>>>>>> /home area (no gluster, still on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same machine)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * dd test without oflag=sync
>>>>>>>>>>>>>>> (rough average of multiple
>>>>>>>>>>>>>>> tests)
>>>>>>>>>>>>>>> o gluster w/ fuse mount :
>>>>>>>>>>>>>>> 570
>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>>>>>>>> 390
>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>> o nfs (no gluster): 1.2
>>>>>>>>>>>>>>> Gb/s
>>>>>>>>>>>>>>> * dd test with oflag=sync
>>>>>>>>>>>>>>> (rough
>>>>>>>>>>>>>>> average of multiple tests)
>>>>>>>>>>>>>>> o gluster w/ fuse mount:
>>>>>>>>>>>>>>> 5
>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>>>>>>>> 200
>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>> o nfs (no gluster): 20
>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Given that the non-gluster area
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> RAID-6 of 4 disks while each
>>>>>>>>>>>>>>> brick
>>>>>>>>>>>>>>> of the gluster area is a RAID-6
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> 32 disks, I would naively expect
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> writes to the gluster area to be
>>>>>>>>>>>>>>> roughly 8x faster than to the
>>>>>>>>>>>>>>> non-gluster.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think a better test is to try and
>>>>>>>>>>>>>>> write to a file using nfs without
>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>> gluster to a location that is not
>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>> the brick but someother location
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> on same disk(s). If you are mounting
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> partition as the brick, then we can
>>>>>>>>>>>>>>> write to a file inside .glusterfs
>>>>>>>>>>>>>>> directory, something like
>>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I still think we have a speed
>>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>> I can't tell if fuse vs nfs is
>>>>>>>>>>>>>>> part
>>>>>>>>>>>>>>> of the problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I got interested in the post because
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> read that fuse speed is lesser than
>>>>>>>>>>>>>>> nfs
>>>>>>>>>>>>>>> speed which is counter-intuitive to
>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>> understanding. So wanted
>>>>>>>>>>>>>>> clarifications.
>>>>>>>>>>>>>>> Now that I got my clarifications
>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>> fuse outperformed nfs without sync,
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> can resume testing as described
>>>>>>>>>>>>>>> above
>>>>>>>>>>>>>>> and try to find what it is. Based on
>>>>>>>>>>>>>>> your email-id I am guessing you are
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> Boston and I am from Bangalore so if
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> are okay with doing this debugging
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> multiple days because of timezones,
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> will be happy to help. Please be a
>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>> patient with me, I am under a
>>>>>>>>>>>>>>> release
>>>>>>>>>>>>>>> crunch but I am very curious with
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> problem you posted.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Was there anything useful in the
>>>>>>>>>>>>>>> profiles?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately profiles didn't help
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>> much, I think we are collecting the
>>>>>>>>>>>>>>> profiles from an active volume, so
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> has a lot of information that is not
>>>>>>>>>>>>>>> pertaining to dd so it is difficult
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> find the contributions of dd. So I
>>>>>>>>>>>>>>> went
>>>>>>>>>>>>>>> through your post again and found
>>>>>>>>>>>>>>> something I didn't pay much
>>>>>>>>>>>>>>> attention
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> earlier i.e. oflag=sync, so did my
>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>> tests on my setup with FUSE so sent
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> reply.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>>>>>>>> Okay good. At least this
>>>>>>>>>>>>>>>> validates
>>>>>>>>>>>>>>>> my doubts. Handling O_SYNC in
>>>>>>>>>>>>>>>> gluster NFS and fuse is a bit
>>>>>>>>>>>>>>>> different.
>>>>>>>>>>>>>>>> When application opens a file
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> O_SYNC on fuse mount then each
>>>>>>>>>>>>>>>> write syscall has to be written
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> disk as part of the syscall
>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>> as in case of NFS, there is no
>>>>>>>>>>>>>>>> concept of open. NFS performs
>>>>>>>>>>>>>>>> write
>>>>>>>>>>>>>>>> though a handle saying it needs
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> be a synchronous write, so
>>>>>>>>>>>>>>>> write()
>>>>>>>>>>>>>>>> syscall is performed first then
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> performs fsync(). so an write
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>> fd with O_SYNC becomes
>>>>>>>>>>>>>>>> write+fsync.
>>>>>>>>>>>>>>>> I am suspecting that when
>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>> threads do this write+fsync()
>>>>>>>>>>>>>>>> operation on the same file,
>>>>>>>>>>>>>>>> multiple writes are batched
>>>>>>>>>>>>>>>> together to be written do disk
>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>> the throughput on the disk is
>>>>>>>>>>>>>>>> increasing is my guess.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does it answer your doubts?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 9:35
>>>>>>>>>>>>>>>> PM,
>>>>>>>>>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Without the oflag=sync and
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> a single test of each, the
>>>>>>>>>>>>>>>> FUSE
>>>>>>>>>>>>>>>> is going faster than NFS:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> FUSE:
>>>>>>>>>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>>>>>>>> conv=sync
>>>>>>>>>>>>>>>> 4096+0 records in
>>>>>>>>>>>>>>>> 4096+0 records out
>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> NFS
>>>>>>>>>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>>>>>>>> conv=sync
>>>>>>>>>>>>>>>> 4096+0 records in
>>>>>>>>>>>>>>>> 4096+0 records out
>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 05/10/2017 11:53 AM,
>>>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>>>>>>>>> Could you let me know the
>>>>>>>>>>>>>>>>> speed without oflag=sync
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> both the mounts? No need
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> collect profiles.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at
>>>>>>>>>>>>>>>>> 9:17
>>>>>>>>>>>>>>>>> PM, Pat Haley
>>>>>>>>>>>>>>>>> <***@mit.edu
>>>>>>>>>>>>>>>>> <mailto:***@mit.edu>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is what I see
>>>>>>>>>>>>>>>>> now:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [***@mseas-data2 ~]#
>>>>>>>>>>>>>>>>> gluster volume info
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Volume Name:
>>>>>>>>>>>>>>>>> data-volume
>>>>>>>>>>>>>>>>> Type: Distribute
>>>>>>>>>>>>>>>>> Volume ID:
>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>>>>>>> Status: Started
>>>>>>>>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>>>>> Brick1:
>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>>>>>>>> Brick2:
>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>>>>>> diagnostics.count-fop-hits:
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> nfs.exports-auth-enable:
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>>>>>>>>>> WARNING
>>>>>>>>>>>>>>>>> performance.readdir-ahead:
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>>>>>> nfs.export-volumes:
>>>>>>>>>>>>>>>>> off
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 05/10/2017 11:44
>>>>>>>>>>>>>>>>> AM,
>>>>>>>>>>>>>>>>> Pranith Kumar
>>>>>>>>>>>>>>>>> Karampuri
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Is this the volume
>>>>>>>>>>>>>>>>>> info
>>>>>>>>>>>>>>>>>> you have?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >/[root at
>>>>>>>>>>>>>>>>>> >mseas-data2
>>>>>>>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>>>>>>>>> ~]# gluster volume
>>>>>>>>>>>>>>>>>> info
>>>>>>>>>>>>>>>>>> />//>/Volume Name:
>>>>>>>>>>>>>>>>>> data-volume />/Type:
>>>>>>>>>>>>>>>>>> Distribute />/Volume
>>>>>>>>>>>>>>>>>> ID:
>>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>>>>>>>> />/Status: Started
>>>>>>>>>>>>>>>>>> />/Number
>>>>>>>>>>>>>>>>>> of Bricks: 2
>>>>>>>>>>>>>>>>>> />/Transport-type:
>>>>>>>>>>>>>>>>>> tcp
>>>>>>>>>>>>>>>>>> />/Bricks: />/Brick1:
>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>>>>>>>>> />/Brick2:
>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>>>>>>>>> />/Options
>>>>>>>>>>>>>>>>>> Reconfigured:
>>>>>>>>>>>>>>>>>> />/performance.readdir-ahead:
>>>>>>>>>>>>>>>>>> on />/nfs.disable: on
>>>>>>>>>>>>>>>>>> />/nfs.export-volumes:
>>>>>>>>>>>>>>>>>> off
>>>>>>>>>>>>>>>>>> /
>>>>>>>>>>>>>>>>>> ​I copied this from
>>>>>>>>>>>>>>>>>> old
>>>>>>>>>>>>>>>>>> thread from 2016.
>>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> distribute volume.
>>>>>>>>>>>>>>>>>> Did
>>>>>>>>>>>>>>>>>> you change any of the
>>>>>>>>>>>>>>>>>> options in between?
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>>>>> Pat Haley
>>>>>>>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>>>>>>>> Center for Ocean
>>>>>>>>>>>>>>>>> Engineering
>>>>>>>>>>>>>>>>> Phone: (617) 253-6824
>>>>>>>>>>>>>>>>> Dept. of Mechanical
>>>>>>>>>>>>>>>>> Engineering
>>>>>>>>>>>>>>>>> Fax: (617) 253-8125
>>>>>>>>>>>>>>>>> MIT, Room
>>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>>>>> 77 Massachusetts
>>>>>>>>>>>>>>>>> Avenue
>>>>>>>>>>>>>>>>> Cambridge, MA
>>>>>>>>>>>>>>>>> 02139-4301
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>>>> Pat Haley
>>>>>>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>>>>>>> Center for Ocean
>>>>>>>>>>>>>>>> Engineering
>>>>>>>>>>>>>>>> Phone: (617) 253-6824
>>>>>>>>>>>>>>>> Dept. of Mechanical
>>>>>>>>>>>>>>>> Engineering
>>>>>>>>>>>>>>>> Fax: (617) 253-8125
>>>>>>>>>>>>>>>> MIT, Room
>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>>> Pat Haley
>>>>>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>>>>>> Center for Ocean Engineering
>>>>>>>>>>>>>>> Phone:
>>>>>>>>>>>>>>> (617) 253-6824
>>>>>>>>>>>>>>> Dept. of Mechanical Engineering
>>>>>>>>>>>>>>> Fax:
>>>>>>>>>>>>>>> (617) 253-8125
>>>>>>>>>>>>>>> MIT, Room
>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>> Pat Haley
>>>>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>>>>> Center for Ocean Engineering
>>>>>>>>>>>>>> Phone:
>>>>>>>>>>>>>> (617) 253-6824
>>>>>>>>>>>>>> Dept. of Mechanical Engineering
>>>>>>>>>>>>>> Fax:
>>>>>>>>>>>>>> (617) 253-8125
>>>>>>>>>>>>>> MIT, Room
>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>> Pat Haley
>>>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>>>> Center for Ocean Engineering Phone:
>>>>>>>>>>>>> (617)
>>>>>>>>>>>>> 253-6824
>>>>>>>>>>>>> Dept. of Mechanical Engineering Fax:
>>>>>>>>>>>>> (617)
>>>>>>>>>>>>> 253-8125
>>>>>>>>>>>>> MIT, Room
>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Pranith
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>> Pat Haley
>>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>>> Center for Ocean Engineering Phone:
>>>>>>>>>>>> (617)
>>>>>>>>>>>> 253-6824
>>>>>>>>>>>> Dept. of Mechanical Engineering Fax:
>>>>>>>>>>>> (617)
>>>>>>>>>>>> 253-8125
>>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Pranith
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>> Pat Haley
>>>>>>>>>>> Email:***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>>> Center for Ocean Engineering Phone: (617)
>>>>>>>>>>> 253-6824
>>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617)
>>>>>>>>>>> 253-8125
>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Pranith
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>> Pat Haley Email:***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pranith
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email:***@mit.edu
>>>>>>>>> <mailto:***@mit.edu>
>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email: ***@mit.edu
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: ***@mit.edu
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: ***@mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: ***@mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: ***@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
Pat Haley
2017-06-22 20:53:42 UTC
Reply
Permalink
Raw Message
Hi,

Today we experimented with some of the FUSE options that we found in the
list.

Changing these options had no effect:

gluster volume set test-volume performance.cache-max-file-size 2MB
gluster volume set test-volume performance.cache-refresh-timeout 4
gluster volume set test-volume performance.cache-size 256MB
gluster volume set test-volume performance.write-behind-window-size 4MB
gluster volume set test-volume performance.write-behind-window-size 8MB

Changing the following option from its default value made the speed slower

gluster volume set test-volume performance.write-behind off (on by default)

Changing the following options initially appeared to give a 10% increase
in speed, but this vanished in subsequent tests (we think the apparent
increase may have been to a lighter workload on the computer from other
users)

gluster volume set test-volume performance.stat-prefetch on
gluster volume set test-volume client.event-threads 4
gluster volume set test-volume server.event-threads 4


Can anything be gleaned from these observations? Are there other things
we can try?

Thanks

Pat


On 06/20/2017 12:06 PM, Pat Haley wrote:
>
> Hi Ben,
>
> Sorry this took so long, but we had a real-time forecasting exercise
> last week and I could only get to this now.
>
> Backend Hardware/OS:
>
> * Much of the information on our back end system is included at the
> top of
> http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html
> * The specific model of the hard disks is SeaGate ENTERPRISE
> CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s.
> * Note: there is one physical server that hosts both the NFS and the
> GlusterFS areas
>
> Latest tests
>
> I have had time to run the tests for one of the dd tests you requested
> to the underlying XFS FS. The median rate was 170 MB/s. The dd
> results and iostat record are in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/
>
> I'll add tests for the other brick and to the NFS area later.
>
> Thanks
>
> Pat
>
>
> On 06/12/2017 06:06 PM, Ben Turner wrote:
>> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use:
>>
>> throughput = slowest of disks / NIC * .6-.7
>>
>> In your case we have:
>>
>> 1200 * .6 = 720
>>
>> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is:
>>
>> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using?
>>
>> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct?
>>
>> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run:
>>
>> If you are focusing on a write workload run:
>>
>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync
>>
>> If you are focusing on a read workload run:
>>
>> # echo 3 > /proc/sys/vm/drop_caches
>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000
>>
>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! **
>>
>> Run this in a loop similar to how you did in:
>>
>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>
>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me:
>>
>> # iostat -c -m -x 1 > iostat-$(hostname).txt
>>
>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster.
>>
>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here.
>>
>> -b
>>
>>
>>
>> ----- Original Message -----
>>> From: "Pat Haley"<***@mit.edu>
>>> To: "Ben Turner"<***@redhat.com>
>>> Sent: Monday, June 12, 2017 5:18:07 PM
>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>
>>>
>>> Hi Ben,
>>>
>>> Here is the output:
>>>
>>> [***@mseas-data2 ~]# gluster volume info
>>>
>>> Volume Name: data-volume
>>> Type: Distribute
>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>> Status: Started
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: mseas-data2:/mnt/brick1
>>> Brick2: mseas-data2:/mnt/brick2
>>> Options Reconfigured:
>>> nfs.exports-auth-enable: on
>>> diagnostics.brick-sys-log-level: WARNING
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> nfs.export-volumes: off
>>>
>>>
>>> On 06/12/2017 05:01 PM, Ben Turner wrote:
>>>> What is the output of gluster v info? That will tell us more about your
>>>> config.
>>>>
>>>> -b
>>>>
>>>> ----- Original Message -----
>>>>> From: "Pat Haley"<***@mit.edu>
>>>>> To: "Ben Turner"<***@redhat.com>
>>>>> Sent: Monday, June 12, 2017 4:54:00 PM
>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>>>
>>>>>
>>>>> Hi Ben,
>>>>>
>>>>> I guess I'm confused about what you mean by replication. If I look at
>>>>> the underlying bricks I only ever have a single copy of any file. It
>>>>> either resides on one brick or the other (directories exist on both
>>>>> bricks but not files). We are not using gluster for redundancy (or at
>>>>> least that wasn't our intent). Is that what you meant by replication
>>>>> or is it something else?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Pat
>>>>>
>>>>> On 06/12/2017 04:28 PM, Ben Turner wrote:
>>>>>> ----- Original Message -----
>>>>>>> From: "Pat Haley"<***@mit.edu>
>>>>>>> To: "Ben Turner"<***@redhat.com>, "Pranith Kumar Karampuri"
>>>>>>> <***@redhat.com>
>>>>>>> Cc: "Ravishankar N"<***@redhat.com>,gluster-***@gluster.org,
>>>>>>> "Steve Postma"<***@ztechnet.com>
>>>>>>> Sent: Monday, June 12, 2017 2:35:41 PM
>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>>>>>
>>>>>>>
>>>>>>> Hi Guys,
>>>>>>>
>>>>>>> I was wondering what our next steps should be to solve the slow write
>>>>>>> times.
>>>>>>>
>>>>>>> Recently I was debugging a large code and writing a lot of output at
>>>>>>> every time step. When I tried writing to our gluster disks, it was
>>>>>>> taking over a day to do a single time step whereas if I had the same
>>>>>>> program (same hardware, network) write to our nfs disk the time per
>>>>>>> time-step was about 45 minutes. What we are shooting for here would be
>>>>>>> to have similar times to either gluster of nfs.
>>>>>> I can see in your test:
>>>>>>
>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>>>>
>>>>>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB /
>>>>>> sec} / #replicas{2} = 600). Gluster does client side replication so with
>>>>>> replica 2 you will only ever see 1/2 the speed of your slowest part of
>>>>>> the
>>>>>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is
>>>>>> normally
>>>>>> a best case. Now in your output I do see the instances where you went
>>>>>> down to 200 MB / sec. I can only explain this in three ways:
>>>>>>
>>>>>> 1. You are not using conv=fdatasync and writes are actually going to
>>>>>> page
>>>>>> cache and then being flushed to disk. During the fsync the memory is not
>>>>>> yet available and the disks are busy flushing dirty pages.
>>>>>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN)
>>>>>> and when write times are slow the RAID group is busy serviceing other
>>>>>> LUNs.
>>>>>> 3. Gluster bug / config issue / some other unknown unknown.
>>>>>>
>>>>>> So I see 2 issues here:
>>>>>>
>>>>>> 1. NFS does in 45 minutes what gluster can do in 24 hours.
>>>>>> 2. Sometimes your throughput drops dramatically.
>>>>>>
>>>>>> WRT #1 - have a look at my estimates above. My formula for guestimating
>>>>>> gluster perf is: throughput = NIC throughput or storage(whatever is
>>>>>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the
>>>>>> record size the better for glusterfs mounts, I normally like to be at
>>>>>> LEAST 64k up to 1024k:
>>>>>>
>>>>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000
>>>>>> conv=fdatasync
>>>>>>
>>>>>> WRT #2 - Again, I question your testing and your storage config. Try
>>>>>> using
>>>>>> conv=fdatasync for your DDs, use a larger record size, and make sure that
>>>>>> your back end storage is not causing your slowdowns. Also remember that
>>>>>> with replica 2 you will take ~50% hit on writes because the client uses
>>>>>> 50% of its bandwidth to write to one replica and 50% to the other.
>>>>>>
>>>>>> -b
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>> On 06/02/2017 01:07 AM, Ben Turner wrote:
>>>>>>>> Are you sure using conv=sync is what you want? I normally use
>>>>>>>> conv=fdatasync, I'll look up the difference between the two and see if
>>>>>>>> it
>>>>>>>> affects your test.
>>>>>>>>
>>>>>>>>
>>>>>>>> -b
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: "Pat Haley"<***@mit.edu>
>>>>>>>>> To: "Pranith Kumar Karampuri"<***@redhat.com>
>>>>>>>>> Cc: "Ravishankar N"<***@redhat.com>,
>>>>>>>>> gluster-***@gluster.org,
>>>>>>>>> "Steve Postma"<***@ztechnet.com>, "Ben
>>>>>>>>> Turner"<***@redhat.com>
>>>>>>>>> Sent: Tuesday, May 30, 2017 9:40:34 PM
>>>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Pranith,
>>>>>>>>>
>>>>>>>>> The "dd" command was:
>>>>>>>>>
>>>>>>>>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>>>>>
>>>>>>>>> There were 2 instances where dd reported 22 seconds. The output from
>>>>>>>>> the
>>>>>>>>> dd tests are in
>>>>>>>>>
>>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>>>>>>>
>>>>>>>>> Pat
>>>>>>>>>
>>>>>>>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>>>>>>>>>> Pat,
>>>>>>>>>> What is the command you used? As per the following output,
>>>>>>>>>> it
>>>>>>>>>> seems like at least one write operation took 16 seconds. Which is
>>>>>>>>>> really bad.
>>>>>>>>>> 96.39 1165.10 us 89.00 us*16487014.00 us*
>>>>>>>>>> 393212
>>>>>>>>>> WRITE
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <***@mit.edu
>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Pranith,
>>>>>>>>>>
>>>>>>>>>> I ran the same 'dd' test both in the gluster test volume and
>>>>>>>>>> in
>>>>>>>>>> the .glusterfs directory of each brick. The median results
>>>>>>>>>> (12
>>>>>>>>>> dd
>>>>>>>>>> trials in each test) are similar to before
>>>>>>>>>>
>>>>>>>>>> * gluster test volume: 586.5 MB/s
>>>>>>>>>> * bricks (in .glusterfs): 1.4 GB/s
>>>>>>>>>>
>>>>>>>>>> The profile for the gluster test-volume is in
>>>>>>>>>>
>>>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Pat
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>>>>>>>>>> Let's start with the same 'dd' test we were testing with to
>>>>>>>>>>> see,
>>>>>>>>>>> what the numbers are. Please provide profile numbers for the
>>>>>>>>>>> same. From there on we will start tuning the volume to see
>>>>>>>>>>> what
>>>>>>>>>>> we can do.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <***@mit.edu
>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the tip. We now have the gluster volume
>>>>>>>>>>> mounted
>>>>>>>>>>> under /home. What tests do you recommend we run?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Pat
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley
>>>>>>>>>>>> <***@mit.edu
>>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the delay. I never saw received your
>>>>>>>>>>>> reply
>>>>>>>>>>>> (but I did receive Ben Turner's follow-up to your
>>>>>>>>>>>> reply). So we tried to create a gluster volume
>>>>>>>>>>>> under
>>>>>>>>>>>> /home using different variations of
>>>>>>>>>>>>
>>>>>>>>>>>> gluster volume create test-volume
>>>>>>>>>>>> mseas-data2:/home/gbrick_test_1
>>>>>>>>>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>>>>>>>>>
>>>>>>>>>>>> However we keep getting errors of the form
>>>>>>>>>>>>
>>>>>>>>>>>> Wrong brick type: transport, use
>>>>>>>>>>>> <HOSTNAME>:<export-dir-abs-path>
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughts on what we're doing wrong?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> You should give transport tcp at the beginning I think.
>>>>>>>>>>>> Anyways, transport tcp is the default, so no need to
>>>>>>>>>>>> specify
>>>>>>>>>>>> so remove those two words from the CLI.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also do you have a list of the test we should be
>>>>>>>>>>>> running
>>>>>>>>>>>> once we get this volume created? Given the
>>>>>>>>>>>> time-zone
>>>>>>>>>>>> difference it might help if we can run a small
>>>>>>>>>>>> battery
>>>>>>>>>>>> of tests and post the results rather than
>>>>>>>>>>>> test-post-new
>>>>>>>>>>>> test-post... .
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is the first time I am doing performance analysis
>>>>>>>>>>>> on
>>>>>>>>>>>> users as far as I remember. In our team there are
>>>>>>>>>>>> separate
>>>>>>>>>>>> engineers who do these tests. Ben who replied earlier is
>>>>>>>>>>>> one
>>>>>>>>>>>> such engineer.
>>>>>>>>>>>>
>>>>>>>>>>>> Ben,
>>>>>>>>>>>> Have any suggestions?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> Pat
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>>>>>>>>>>> <***@mit.edu <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>
>>>>>>>>>>>>> The /home partition is mounted as ext4
>>>>>>>>>>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>>>>>>>>>>
>>>>>>>>>>>>> The brick partitions are mounted ax xfs
>>>>>>>>>>>>> /mnt/brick1 xfs defaults 0 0
>>>>>>>>>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Will this cause a problem with creating a
>>>>>>>>>>>>> volume
>>>>>>>>>>>>> under /home?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think the bottleneck is disk. You can do
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same tests you did on your new volume to confirm?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>>>>>>>>>>> <***@mit.edu <mailto:***@mit.edu>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately, we don't have similar
>>>>>>>>>>>>>> hardware
>>>>>>>>>>>>>> for a small scale test. All we have is
>>>>>>>>>>>>>> our
>>>>>>>>>>>>>> production hardware.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You said something about /home partition which
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>> lesser disks, we can create plain distribute
>>>>>>>>>>>>>> volume inside one of those directories. After
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>> are done, we can remove the setup. What do you
>>>>>>>>>>>>>> say?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>>>>>>>>>>> Karampuri wrote:
>>>>>>>>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat
>>>>>>>>>>>>>>> Haley
>>>>>>>>>>>>>>> <***@mit.edu <mailto:***@mit.edu>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since we are mounting the partitions
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> the bricks, I tried the dd test
>>>>>>>>>>>>>>> writing
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>>>>>>> The results without oflag=sync were
>>>>>>>>>>>>>>> 1.6
>>>>>>>>>>>>>>> Gb/s (faster than gluster but not as
>>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>>> as I was expecting given the 1.2 Gb/s
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the no-gluster area w/ fewer disks).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Okay, then 1.6Gb/s is what we need to
>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>> for, considering your volume is just
>>>>>>>>>>>>>>> distribute. Is there any way you can do
>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>> on similar hardware but at a small scale?
>>>>>>>>>>>>>>> Just so we can run the workload to learn
>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>> about the bottlenecks in the system? We
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> probably try to get the speed to 1.2Gb/s
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> your /home partition you were telling me
>>>>>>>>>>>>>>> yesterday. Let me know if that is
>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>> you are okay to do.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>>>>>>>>>>> Karampuri wrote:
>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM,
>>>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>>> Haley <***@mit.edu
>>>>>>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Not entirely sure (this isn't my
>>>>>>>>>>>>>>>> area of expertise). I'll run
>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>> answer by some other people who
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> more familiar with this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am also uncertain about how to
>>>>>>>>>>>>>>>> interpret the results when we
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> add the dd tests writing to the
>>>>>>>>>>>>>>>> /home area (no gluster, still on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same machine)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * dd test without oflag=sync
>>>>>>>>>>>>>>>> (rough average of multiple
>>>>>>>>>>>>>>>> tests)
>>>>>>>>>>>>>>>> o gluster w/ fuse mount :
>>>>>>>>>>>>>>>> 570
>>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>>>>>>>>> 390
>>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>>> o nfs (no gluster): 1.2
>>>>>>>>>>>>>>>> Gb/s
>>>>>>>>>>>>>>>> * dd test with oflag=sync
>>>>>>>>>>>>>>>> (rough
>>>>>>>>>>>>>>>> average of multiple tests)
>>>>>>>>>>>>>>>> o gluster w/ fuse mount:
>>>>>>>>>>>>>>>> 5
>>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>>> o gluster w/ nfs mount:
>>>>>>>>>>>>>>>> 200
>>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>>> o nfs (no gluster): 20
>>>>>>>>>>>>>>>> Mb/s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Given that the non-gluster area
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> RAID-6 of 4 disks while each
>>>>>>>>>>>>>>>> brick
>>>>>>>>>>>>>>>> of the gluster area is a RAID-6
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> 32 disks, I would naively expect
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> writes to the gluster area to be
>>>>>>>>>>>>>>>> roughly 8x faster than to the
>>>>>>>>>>>>>>>> non-gluster.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think a better test is to try and
>>>>>>>>>>>>>>>> write to a file using nfs without
>>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>> gluster to a location that is not
>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>> the brick but someother location
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> on same disk(s). If you are mounting
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> partition as the brick, then we can
>>>>>>>>>>>>>>>> write to a file inside .glusterfs
>>>>>>>>>>>>>>>> directory, something like
>>>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I still think we have a speed
>>>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>>> I can't tell if fuse vs nfs is
>>>>>>>>>>>>>>>> part
>>>>>>>>>>>>>>>> of the problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I got interested in the post because
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> read that fuse speed is lesser than
>>>>>>>>>>>>>>>> nfs
>>>>>>>>>>>>>>>> speed which is counter-intuitive to
>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>> understanding. So wanted
>>>>>>>>>>>>>>>> clarifications.
>>>>>>>>>>>>>>>> Now that I got my clarifications
>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>> fuse outperformed nfs without sync,
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> can resume testing as described
>>>>>>>>>>>>>>>> above
>>>>>>>>>>>>>>>> and try to find what it is. Based on
>>>>>>>>>>>>>>>> your email-id I am guessing you are
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> Boston and I am from Bangalore so if
>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>> are okay with doing this debugging
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> multiple days because of timezones,
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> will be happy to help. Please be a
>>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>> patient with me, I am under a
>>>>>>>>>>>>>>>> release
>>>>>>>>>>>>>>>> crunch but I am very curious with
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> problem you posted.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Was there anything useful in the
>>>>>>>>>>>>>>>> profiles?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Unfortunately profiles didn't help
>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>> much, I think we are collecting the
>>>>>>>>>>>>>>>> profiles from an active volume, so
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> has a lot of information that is not
>>>>>>>>>>>>>>>> pertaining to dd so it is difficult
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> find the contributions of dd. So I
>>>>>>>>>>>>>>>> went
>>>>>>>>>>>>>>>> through your post again and found
>>>>>>>>>>>>>>>> something I didn't pay much
>>>>>>>>>>>>>>>> attention
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> earlier i.e. oflag=sync, so did my
>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>> tests on my setup with FUSE so sent
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> reply.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>>>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>>>>>>>>> Okay good. At least this
>>>>>>>>>>>>>>>>> validates
>>>>>>>>>>>>>>>>> my doubts. Handling O_SYNC in
>>>>>>>>>>>>>>>>> gluster NFS and fuse is a bit
>>>>>>>>>>>>>>>>> different.
>>>>>>>>>>>>>>>>> When application opens a file
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> O_SYNC on fuse mount then each
>>>>>>>>>>>>>>>>> write syscall has to be written
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> disk as part of the syscall
>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>> as in case of NFS, there is no
>>>>>>>>>>>>>>>>> concept of open. NFS performs
>>>>>>>>>>>>>>>>> write
>>>>>>>>>>>>>>>>> though a handle saying it needs
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> be a synchronous write, so
>>>>>>>>>>>>>>>>> write()
>>>>>>>>>>>>>>>>> syscall is performed first then
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> performs fsync(). so an write
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> fd with O_SYNC becomes
>>>>>>>>>>>>>>>>> write+fsync.
>>>>>>>>>>>>>>>>> I am suspecting that when
>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>> threads do this write+fsync()
>>>>>>>>>>>>>>>>> operation on the same file,
>>>>>>>>>>>>>>>>> multiple writes are batched
>>>>>>>>>>>>>>>>> together to be written do disk
>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>> the throughput on the disk is
>>>>>>>>>>>>>>>>> increasing is my guess.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does it answer your doubts?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 9:35
>>>>>>>>>>>>>>>>> PM,
>>>>>>>>>>>>>>>>> Pat Haley <***@mit.edu
>>>>>>>>>>>>>>>>> <mailto:***@mit.edu>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Without the oflag=sync and
>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>> a single test of each, the
>>>>>>>>>>>>>>>>> FUSE
>>>>>>>>>>>>>>>>> is going faster than NFS:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> FUSE:
>>>>>>>>>>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>>>>>>>>> conv=sync
>>>>>>>>>>>>>>>>> 4096+0 records in
>>>>>>>>>>>>>>>>> 4096+0 records out
>>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> NFS
>>>>>>>>>>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>>>>>>>>>>> if=/dev/zero count=4096
>>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt
>>>>>>>>>>>>>>>>> conv=sync
>>>>>>>>>>>>>>>>> 4096+0 records in
>>>>>>>>>>>>>>>>> 4096+0 records out
>>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>>>>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 05/10/2017 11:53 AM,
>>>>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>>>>>>>>>>> Could you let me know the
>>>>>>>>>>>>>>>>>> speed without oflag=sync
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> both the mounts? No need
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> collect profiles.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at
>>>>>>>>>>>>>>>>>> 9:17
>>>>>>>>>>>>>>>>>> PM, Pat Haley
>>>>>>>>>>>>>>>>>> <***@mit.edu
>>>>>>>>>>>>>>>>>> <mailto:***@mit.edu>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is what I see
>>>>>>>>>>>>>>>>>> now:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [***@mseas-data2 ~]#
>>>>>>>>>>>>>>>>>> gluster volume info
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Volume Name:
>>>>>>>>>>>>>>>>>> data-volume
>>>>>>>>>>>>>>>>>> Type: Distribute
>>>>>>>>>>>>>>>>>> Volume ID:
>>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>>>>>>>> Status: Started
>>>>>>>>>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>>>>>> Brick1:
>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>>>>>>>>> Brick2:
>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>>>>>>> diagnostics.count-fop-hits:
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> nfs.exports-auth-enable:
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>>>>>>>>>>> WARNING
>>>>>>>>>>>>>>>>>> performance.readdir-ahead:
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>>>>>>> nfs.export-volumes:
>>>>>>>>>>>>>>>>>> off
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 05/10/2017 11:44
>>>>>>>>>>>>>>>>>> AM,
>>>>>>>>>>>>>>>>>> Pranith Kumar
>>>>>>>>>>>>>>>>>> Karampuri
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Is this the volume
>>>>>>>>>>>>>>>>>>> info
>>>>>>>>>>>>>>>>>>> you have?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> >/[root at
>>>>>>>>>>>>>>>>>>> >mseas-data2
>>>>>>>>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>>>>>>>>>> ~]# gluster volume
>>>>>>>>>>>>>>>>>>> info
>>>>>>>>>>>>>>>>>>> />//>/Volume Name:
>>>>>>>>>>>>>>>>>>> data-volume />/Type:
>>>>>>>>>>>>>>>>>>> Distribute />/Volume
>>>>>>>>>>>>>>>>>>> ID:
>>>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>>>>>>>>> />/Status: Started
>>>>>>>>>>>>>>>>>>> />/Number
>>>>>>>>>>>>>>>>>>> of Bricks: 2
>>>>>>>>>>>>>>>>>>> />/Transport-type:
>>>>>>>>>>>>>>>>>>> tcp
>>>>>>>>>>>>>>>>>>> />/Bricks: />/Brick1:
>>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>>>>>>>>>>>> />/Brick2:
>>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>>>>>>>>>>>> />/Options
>>>>>>>>>>>>>>>>>>> Reconfigured:
>>>>>>>>>>>>>>>>>>> />/performance.readdir-ahead:
>>>>>>>>>>>>>>>>>>> on />/nfs.disable: on
>>>>>>>>>>>>>>>>>>> />/nfs.export-volumes:
>>>>>>>>>>>>>>>>>>>