[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

Discussion:

Artem Russakovskii

2018-02-04 07:20:06 UTC

Hi,

I have been working on setting up a 4 replica gluster with over a million
files (~250GB total), and I've seen some really weird stuff happen, even
after trying to optimize for small files. I've set up a 4-brick replicate
volume (gluster 3.13.2).

It took almost 2 days to rsync the data from the local drive to the gluster
volume, and now I'm running a 2nd rsync that just looks for changes in case
more files have been written. I'd like to concentrate this email on a very
specific and odd issue.

The dir structure is
YYYY/
MM/
10k+files in each month folder

rsyncing each month folder cold can take 2+ minutes.

However, if I ls the destination folder first, or use find (both of which
finish within 5 seconds), the rsync is almost instant.

Here's a log with time calls that shows you what happens.:

box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/08/ 08/
sending incremental file list
^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at
rsync.c(637) [sender=3.1.0]

real 1m39.848s
user 0m0.010s
sys 0m0.030s
box:/mnt/gluster/uploads/2017 # time find 08 | wc -l
14254

real 0m0.726s
user 0m0.013s
sys 0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/08/ 08/
sending incremental file list

real 0m0.562s
user 0m0.057s
sys 0m0.137s
box:/mnt/gluster/uploads/2017 # time find 07 | wc -l
10103

real 0m4.550s
user 0m0.010s
sys 0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/07/ 07/
sending incremental file list

real 0m0.428s
user 0m0.030s
sys 0m0.083s
box:/mnt/gluster/uploads/2017 # time ls 06 | wc -l
11890

real 0m1.850s
user 0m0.077s
sys 0m0.040s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/06/ 06/
sending incremental file list

real 0m0.627s
user 0m0.073s
sys 0m0.107s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/05/ 05/
sending incremental file list

real 2m24.382s
user 0m0.127s
sys 0m0.357s

Note how if I precede the rsync call with ls or find, the rsync completes
in less than a second (finding no files to sync because they've already
been synced). Otherwise, it takes over 2 minutes (I interrupted the first
call before the 2 minutes because it was already taking too long).

What could be causing rsync to work so slowly unless the dir is primed?

Volume config:

Volume Name: gluster
Type: Replicate
Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: server1 :/mnt/server1_block4/gluster
Brick2: server2 :/mnt/server2_block4/gluster
Brick3: server3 :/mnt/server3_block4/gluster
Brick4: server4 :/mnt/server4_block4/gluster
Options Reconfigured:
performance.parallel-readdir: off
transport.address-family: inet
nfs.disable: on
cluster.self-heal-daemon: enable
performance.cache-size: 1GB
network.ping-timeout: 5
cluster.quorum-type: fixed
cluster.quorum-count: 1
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 500000
performance.rda-cache-limit: 256MB
performance.read-ahead: off
client.event-threads: 4
server.event-threads: 4

Thank you for any insight.

Sincerely,
Artem

Artem Russakovskii

2018-02-04 08:45:41 UTC

Permalink

An update, and a very interesting one!

After I started stracing rsync, all I could see was lstat calls, quite slow
ones, over and over, which is expected.

For example: lstat("uploads/2016/10/nexus2cee_DSC05339_thumb-161x107.jpg",
{st_mode=S_IFREG|0664, st_size=4043, ...}) = 0

I googled around and found
https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1, which is seeing
this exact issue with gluster, rsync and xfs.

Here's the craziest finding so far. If while rsync is running (or right
before), I run /bin/ls or find on the same gluster dirs, it immediately
speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely
insane.

I'm stracing the rsync run, and the slow lstat calls flood in at an
incredible speed as soon as ls or find run. Several hundred of files per
minute (excruciatingly slow) becomes thousands or even tens of thousands of
files a second.

What do you make of this?

Vlad Kopylov

2018-02-05 01:55:22 UTC

Permalink

You mounting it to the local bricks?

struggling with same performance issues
try using this volume setting
http://lists.gluster.org/pipermail/gluster-users/2018-January/033397.html
performance.stat-prefetch: on might be it

seems like when it gets to cache it is fast - those stat fetch which
seem to come from .gluster are slow

Post by Artem Russakovskii
An update, and a very interesting one!
After I started stracing rsync, all I could see was lstat calls, quite slow
ones, over and over, which is expected.
For example: lstat("uploads/2016/10/nexus2cee_DSC05339_thumb-161x107.jpg",
{st_mode=S_IFREG|0664, st_size=4043, ...}) = 0
I googled around and found
https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1, which is
seeing this exact issue with gluster, rsync and xfs.
Here's the craziest finding so far. If while rsync is running (or right
before), I run /bin/ls or find on the same gluster dirs, it immediately
speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely
insane.
I'm stracing the rsync run, and the slow lstat calls flood in at an
incredible speed as soon as ls or find run. Several hundred of files per
minute (excruciatingly slow) becomes thousands or even tens of thousands of
files a second.
What do you make of this?
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Amar Tumballi

2018-02-05 02:44:12 UTC

Permalink

Thanks for the report Artem,

Looks like the issue is about cache warming up. Specially, I suspect rsync
doing a 'readdir(), stat(), file operations' loop, where as when a find or
ls is issued, we get 'readdirp()' request, which contains the stat
information along with entries, which also makes sure cache is up-to-date
(at md-cache layer).

Note that this is just a off-the memory hypothesis, We surely need to
analyse and debug more thoroughly for a proper explanation. Some one in my
team would look at it soon.

Regards,
Amar

Post by Vlad Kopylov
You mounting it to the local bricks?
struggling with same performance issues
try using this volume setting
http://lists.gluster.org/pipermail/gluster-users/2018-January/033397.html
performance.stat-prefetch: on might be it
seems like when it gets to cache it is fast - those stat fetch which
seem to come from .gluster are slow

Post by Artem Russakovskii
An update, and a very interesting one!
After I started stracing rsync, all I could see was lstat calls, quite

slow

Post by Artem Russakovskii
ones, over and over, which is expected.
For example: lstat("uploads/2016/10/nexus2cee_DSC05339_thumb-

161x107.jpg",

Post by Artem Russakovskii
{st_mode=S_IFREG|0664, st_size=4043, ...}) = 0
I googled around and found
https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1, which is
seeing this exact issue with gluster, rsync and xfs.
Here's the craziest finding so far. If while rsync is running (or right
before), I run /bin/ls or find on the same gluster dirs, it immediately
speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely
insane.
I'm stracing the rsync run, and the slow lstat calls flood in at an
incredible speed as soon as ls or find run. Several hundred of files per
minute (excruciatingly slow) becomes thousands or even tens of thousands

Post by Artem Russakovskii
files a second.
What do you make of this?
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Amar Tumballi (amarts)

Tom Fite

2018-02-05 16:18:00 UTC

Permalink

Hi all,

I have seen this issue as well, on Gluster 3.12.1. (3 bricks per box, 2
boxes, distributed-replicate) My testing shows the same thing -- running a
find on a directory dramatically increases lstat performance. To add
another clue, the performance degrades again after issuing a call to reset
the system's cache of dentries and inodes:

# sync; echo 2 > /proc/sys/vm/drop_caches

I think that this shows that it's the system cache that's actually doing
the heavy lifting here. There are a couple of sysctl tunables that I've
found helps out with this.

See here:

http://docs.gluster.org/en/latest/Administrator%20Guide/Linux%20Kernel%20Tuning/

Contrary to what that doc says, I've found that setting
vm.vfs_cache_pressure to a low value increases performance by allowing more
dentries and inodes to be retained in the cache.

# Set the swappiness to avoid swap when possible.
vm.swappiness = 10

# Set the cache pressure to prefer inode and dentry cache over file cache.
This is done to keep as many
# dentries and inodes in cache as possible, which dramatically improves
gluster small file performance.
vm.vfs_cache_pressure = 25

For comparison, my config is:

Volume Name: gv0
Type: Tier
Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
Status: Started
Snapshot Count: 13
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: gluster2:/data/hot_tier/gv0
Brick2: gluster1:/data/hot_tier/gv0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick3: gluster1:/data/brick1/gv0
Brick4: gluster2:/data/brick1/gv0
Brick5: gluster1:/data/brick2/gv0
Brick6: gluster2:/data/brick2/gv0
Brick7: gluster1:/data/brick3/gv0
Brick8: gluster2:/data/brick3/gv0
Options Reconfigured:
performance.cache-max-file-size: 128MB
cluster.readdir-optimize: on
cluster.watermark-hi: 95
features.ctr-sql-db-cachesize: 262144
cluster.read-freq-threshold: 5
cluster.write-freq-threshold: 2
features.record-counters: on
cluster.tier-promote-frequency: 15000
cluster.tier-pause: off
cluster.tier-compact: on
cluster.tier-mode: cache
features.ctr-enabled: on
performance.cache-refresh-timeout: 60
performance.stat-prefetch: on
server.outstanding-rpc-limit: 2056
cluster.lookup-optimize: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.barrier: disable
client.event-threads: 4
server.event-threads: 4
performance.cache-size: 1GB
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.quick-read: on
performance.io-cache: on
performance.nfs.write-behind-window-size: 4MB
performance.write-behind-window-size: 4MB
performance.nfs.io-threads: off
network.tcp-window-size: 1048576
performance.rda-cache-limit: 64MB
performance.flush-behind: on
server.allow-insecure: on
cluster.tier-demote-frequency: 18000
cluster.tier-max-files: 1000000
cluster.tier-max-promote-file-size: 10485760
cluster.tier-max-mb: 64000
features.ctr-sql-db-wal-autocheckpoint: 2500
cluster.tier-hot-compact-frequency: 86400
cluster.tier-cold-compact-frequency: 86400
performance.readdir-ahead: off
cluster.watermark-low: 50
storage.build-pgfid: on
performance.rda-request-size: 128KB
performance.rda-low-wmark: 4KB
cluster.min-free-disk: 5%
auto-delete: enable

Post by Amar Tumballi
Thanks for the report Artem,
Looks like the issue is about cache warming up. Specially, I suspect rsync
doing a 'readdir(), stat(), file operations' loop, where as when a find or
ls is issued, we get 'readdirp()' request, which contains the stat
information along with entries, which also makes sure cache is up-to-date
(at md-cache layer).
Note that this is just a off-the memory hypothesis, We surely need to
analyse and debug more thoroughly for a proper explanation. Some one in my
team would look at it soon.
Regards,
Amar

Post by Artem Russakovskii
An update, and a very interesting one!
After I started stracing rsync, all I could see was lstat calls, quite

slow

Post by Artem Russakovskii
ones, over and over, which is expected.
For example: lstat("uploads/2016/10/nexus2c

ee_DSC05339_thumb-161x107.jpg",

thousands of

_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Amar Tumballi (amarts)
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Artem Russakovskii

2018-02-27 13:52:16 UTC

Permalink

Any updates on this one?

Post by Tom Fite
Hi all,
I have seen this issue as well, on Gluster 3.12.1. (3 bricks per box, 2
boxes, distributed-replicate) My testing shows the same thing -- running a
find on a directory dramatically increases lstat performance. To add
another clue, the performance degrades again after issuing a call to reset
# sync; echo 2 > /proc/sys/vm/drop_caches
I think that this shows that it's the system cache that's actually doing
the heavy lifting here. There are a couple of sysctl tunables that I've
found helps out with this.
http://docs.gluster.org/en/latest/Administrator%20Guide/
Linux%20Kernel%20Tuning/
Contrary to what that doc says, I've found that setting
vm.vfs_cache_pressure to a low value increases performance by allowing more
dentries and inodes to be retained in the cache.
# Set the swappiness to avoid swap when possible.
vm.swappiness = 10
# Set the cache pressure to prefer inode and dentry cache over file cache.
This is done to keep as many
# dentries and inodes in cache as possible, which dramatically improves
gluster small file performance.
vm.vfs_cache_pressure = 25
Volume Name: gv0
Type: Tier
Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
Status: Started
Snapshot Count: 13
Number of Bricks: 8
Transport-type: tcp
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: gluster2:/data/hot_tier/gv0
Brick2: gluster1:/data/hot_tier/gv0
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick3: gluster1:/data/brick1/gv0
Brick4: gluster2:/data/brick1/gv0
Brick5: gluster1:/data/brick2/gv0
Brick6: gluster2:/data/brick2/gv0
Brick7: gluster1:/data/brick3/gv0
Brick8: gluster2:/data/brick3/gv0
performance.cache-max-file-size: 128MB
cluster.readdir-optimize: on
cluster.watermark-hi: 95
features.ctr-sql-db-cachesize: 262144
cluster.read-freq-threshold: 5
cluster.write-freq-threshold: 2
features.record-counters: on
cluster.tier-promote-frequency: 15000
cluster.tier-pause: off
cluster.tier-compact: on
cluster.tier-mode: cache
features.ctr-enabled: on
performance.cache-refresh-timeout: 60
performance.stat-prefetch: on
server.outstanding-rpc-limit: 2056
cluster.lookup-optimize: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.barrier: disable
client.event-threads: 4
server.event-threads: 4
performance.cache-size: 1GB
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.quick-read: on
performance.io-cache: on
performance.nfs.write-behind-window-size: 4MB
performance.write-behind-window-size: 4MB
performance.nfs.io-threads: off
network.tcp-window-size: 1048576
performance.rda-cache-limit: 64MB
performance.flush-behind: on
server.allow-insecure: on
cluster.tier-demote-frequency: 18000
cluster.tier-max-files: 1000000
cluster.tier-max-promote-file-size: 10485760
cluster.tier-max-mb: 64000
features.ctr-sql-db-wal-autocheckpoint: 2500
cluster.tier-hot-compact-frequency: 86400
cluster.tier-cold-compact-frequency: 86400
performance.readdir-ahead: off
cluster.watermark-low: 50
storage.build-pgfid: on
performance.rda-request-size: 128KB
performance.rda-low-wmark: 4KB
cluster.min-free-disk: 5%
auto-delete: enable

Post by Amar Tumballi
Thanks for the report Artem,
Looks like the issue is about cache warming up. Specially, I suspect
rsync doing a 'readdir(), stat(), file operations' loop, where as when a
find or ls is issued, we get 'readdirp()' request, which contains the stat
information along with entries, which also makes sure cache is up-to-date
(at md-cache layer).
Note that this is just a off-the memory hypothesis, We surely need to
analyse and debug more thoroughly for a proper explanation. Some one in my
team would look at it soon.
Regards,
Amar

Post by Vlad Kopylov
You mounting it to the local bricks?
struggling with same performance issues
try using this volume setting
http://lists.gluster.org/pipermail/gluster-users/2018-Januar
y/033397.html
performance.stat-prefetch: on might be it
seems like when it gets to cache it is fast - those stat fetch which
seem to come from .gluster are slow

Post by Artem Russakovskii
An update, and a very interesting one!
After I started stracing rsync, all I could see was lstat calls, quite

slow

Post by Artem Russakovskii
ones, over and over, which is expected.
For example: lstat("uploads/2016/10/nexus2c

ee_DSC05339_thumb-161x107.jpg",

per

Post by Artem Russakovskii
minute (excruciatingly slow) becomes thousands or even tens of

thousands of

_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Amar Tumballi (amarts)
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Ingard Mevåg

2018-02-27 14:15:24 UTC

Permalink

We got extremely slow stat calls on our disperse cluster running latest
3.12 with clients also running 3.12.
When we downgraded clients to 3.10 the slow stat problem went away.

We later found out that by disabling disperse.eager-lock we could run the
3.12 clients without much issue (a little bit slower writes)
There is an open issue on this here :
https://bugzilla.redhat.com/show_bug.cgi?id=1546732

Artem Russakovskii

2018-04-18 04:31:45 UTC

Permalink

Nithya, Amar,

Any movement here? There could be a significant performance gain here that
may also affect other bottlenecks that I'm experiencing which make gluster
close to unusable at times.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>

Post by Ingard MevÃ¥g
We got extremely slow stat calls on our disperse cluster running latest
3.12 with clients also running 3.12.
When we downgraded clients to 3.10 the slow stat problem went away.
We later found out that by disabling disperse.eager-lock we could run the
3.12 clients without much issue (a little bit slower writes)
There is an open issue on this here : https://bugzilla.redhat.com/
show_bug.cgi?id=1546732

Nithya Balachandran

2018-04-18 13:46:58 UTC

Permalink

Hi Artem,

I have not looked into this but what Amar said seems likely. Including
Poornima to this thread.

Regards,
Nithya

Post by Artem Russakovskii
Nithya, Amar,
Any movement here? There could be a significant performance gain here that
may also affect other bottlenecks that I'm experiencing which make gluster
close to unusable at times.
Sincerely,
Artem
--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<http://twitter.com/ArtemR>