Discussion:
[Gluster-users] blocking process on FUSE mount in directory which is using quota
mabi
2018-08-09 12:54:29 UTC
Permalink
Hello,

I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes.

This directory has one particularity compared to the other directories it is that it has reached it's quota soft-limit as you can see here in the output of gluster volume quota list:

Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No

That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special?

Here is an the linux stack of a blocking process on that directory which happened with a simple "ls -la":

[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286 0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490 ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0 ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] Call Trace:
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ? prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ? fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ? path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ? kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ? fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ? filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ? user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ? kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ? user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15


My 3 gluster nodes are all Debian 9 and my client Debian 8.

Let me know if you need more information.

Best regards,
Mabi
Nithya Balachandran
2018-08-09 13:10:12 UTC
Permalink
Hi,

Please provide the following:

1. gluster volume info
2. statedump of the fuse process when it hangs


Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12
and now I see a weird behaviour on my client (using FUSE mount) where I
have processes (PHP 5.6 FPM) trying to access a specific directory and then
the process blocks. I can't kill the process either, not even with kill -9.
I need to reboot the machine in order to get rid of these blocked processes.
This directory has one particularity compared to the other directories it
is that it has reached it's quota soft-limit as you can see here in the
Path Hard-limit Soft-limit Used
Available Soft-limit exceeded? Hard-limit exceeded?
------------------------------------------------------------
-------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB
9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint
where to start looking for... And by the way can someone explain me what
the soft-limit does? or does it not do anything special?
Here is an the linux stack of a blocking process on that directory which
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272
2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286
0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490
ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0
ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ?
__fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ?
prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ?
fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ?
path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ?
kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ?
fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ?
filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ?
user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ?
kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ?
user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ?
system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-08-09 13:17:57 UTC
Permalink
Hi Nithya,

Thanks for the fast answer. Here the additional info:

1. gluster volume info

Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
Options Reconfigured:
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92

2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which command should I use for that? and which process would you like, the blocked process (for example "ls")?

Regards,
M.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
- gluster volume info
- statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special?
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286 0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490 ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0 ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ? prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ? fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ? path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ? kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ? fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ? filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ? user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ? kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ? user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Raghavendra Gowdappa
2018-08-09 13:59:06 UTC
Permalink
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux.
Which command should I use for that? and which process would you like, the
blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to
https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
1. gluster volume info
2. statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12
and now I see a weird behaviour on my client (using FUSE mount) where I
have processes (PHP 5.6 FPM) trying to access a specific directory and then
the process blocks. I can't kill the process either, not even with kill -9.
I need to reboot the machine in order to get rid of these blocked processes.
This directory has one particularity compared to the other directories it
is that it has reached it's quota soft-limit as you can see here in the
Path Hard-limit Soft-limit
Used Available Soft-limit exceeded? Hard-limit exceeded?
------------------------------------------------------------
-------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB
9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint
where to start looking for... And by the way can someone explain me what
the soft-limit does? or does it not do anything special?
Here is an the linux stack of a blocking process on that directory which
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0
2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286
0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490
ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0
ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ?
__fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ?
prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ?
fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ?
path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ?
kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ?
fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ?
filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ?
user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ?
kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ?
user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ?
system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-08-09 14:24:49 UTC
Permalink
Thanks for the documentation. On my client using FUSE mount I found the PID by using ps (output below):

root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private

Then I ran the following command

sudo kill -USR1 456

but now I can't find where the files are stored. Are these supposed to be stored on the client directly? I checked /var/run/gluster and /var/log/gluster but could not see anything and /var/log/gluster does not even exist on the client.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which command should I use for that? and which process would you like, the blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
- gluster volume info
- statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special?
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286 0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490 ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0 ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ? prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ? fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ? path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ? kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ? fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ? filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ? user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ? kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ? user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-08-10 14:19:28 UTC
Permalink
Post by mabi
Thanks for the documentation. On my client using FUSE mount I found the
root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs
--volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
Then I ran the following command
sudo kill -USR1 456
but now I can't find where the files are stored. Are these supposed to be
stored on the client directly? I checked /var/run/gluster and
/var/log/gluster but could not see anything and /var/log/gluster does not
even exist on the client.
They are usually created in /var/run/gluster. You will need to create the
directory on the client if it does not exist.



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux.
Which command should I use for that? and which process would you like, the
blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to
https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for
instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
1. gluster volume info
2. statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version
3.12.12 and now I see a weird behaviour on my client (using FUSE mount)
where I have processes (PHP 5.6 FPM) trying to access a specific directory
and then the process blocks. I can't kill the process either, not even with
kill -9. I need to reboot the machine in order to get rid of these blocked
processes.
This directory has one particularity compared to the other directories
it is that it has reached it's quota soft-limit as you can see here in the
Path Hard-limit Soft-limit
Used Available Soft-limit exceeded? Hard-limit exceeded?
------------------------------------------------------------
-------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB
9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint
where to start looking for... And by the way can someone explain me what
the soft-limit does? or does it not do anything special?
Here is an the linux stack of a blocking process on that directory which
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0
2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286
0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490
ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0
ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ?
__fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ?
prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ?
fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ?
lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ?
path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ?
kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ?
fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ?
filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ?
user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ?
kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ?
user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ?
SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ?
SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ?
system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-08-14 08:28:19 UTC
Permalink
As you mentioned after creating the /var/run/gluster directory I got a statedump file in there.

As a workaround I have now removed the quota for this specific directory and as it is a production server I can currently not "play" with it by adding the quota back and having the same problem as it requires me to reboot the server with downtime...

But I can confirm that by removing the quota from that directory, the problem is gone (no more blocking processes such as "ls") so there must be an issue or bug with the quota part of gluster.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
Then I ran the following command
sudo kill -USR1 456
but now I can't find where the files are stored. Are these supposed to be stored on the client directly? I checked /var/run/gluster and /var/log/gluster but could not see anything and /var/log/gluster does not even exist on the client.
They are usually created in /var/run/gluster. You will need to create the directory on the client if it does not exist.
Post by mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which command should I use for that? and which process would you like, the blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
- gluster volume info
- statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special?
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286 0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490 ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0 ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ? prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ? fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ? path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ? kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ? fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ? filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ? user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ? kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ? user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-08-14 08:48:24 UTC
Permalink
Thanks for letting us know. Sanoj, can you take a look at this?

Thanks.
Nithya
Post by mabi
As you mentioned after creating the /var/run/gluster directory I got a
statedump file in there.
As a workaround I have now removed the quota for this specific directory
and as it is a production server I can currently not "play" with it by
adding the quota back and having the same problem as it requires me to
reboot the server with downtime...
But I can confirm that by removing the quota from that directory, the
problem is gone (no more blocking processes such as "ls") so there must be
an issue or bug with the quota part of gluster.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Thanks for the documentation. On my client using FUSE mount I found the
root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs
--volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
Then I ran the following command
sudo kill -USR1 456
but now I can't find where the files are stored. Are these supposed to be
stored on the client directly? I checked /var/run/gluster and
/var/log/gluster but could not see anything and /var/log/gluster does not
even exist on the client.
They are usually created in /var/run/gluster. You will need to create the
directory on the client if it does not exist.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux.
Which command should I use for that? and which process would you like, the
blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to
https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for
instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
1. gluster volume info
2. statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version
3.12.12 and now I see a weird behaviour on my client (using FUSE mount)
where I have processes (PHP 5.6 FPM) trying to access a specific directory
and then the process blocks. I can't kill the process either, not even with
kill -9. I need to reboot the machine in order to get rid of these blocked
processes.
This directory has one particularity compared to the other directories
it is that it has reached it's quota soft-limit as you can see here in the
Path Hard-limit Soft-limit
Used Available Soft-limit exceeded? Hard-limit exceeded?
------------------------------------------------------------
-------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB
9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint
where to start looking for... And by the way can someone explain me what
the soft-limit does? or does it not do anything special?
Here is an the linux stack of a blocking process on that directory
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0
2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286
0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490
ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0
ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ?
__fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ?
prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ?
fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ?
lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ?
path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ?
kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ?
fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ?
filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ?
user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ?
kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ?
user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ?
SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ?
SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ?
system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-08-14 13:53:28 UTC
Permalink
Bad news: the process blocked happened again this time with another directory of another user which is NOT over his quota but which also has quota enabled.

The symptoms on the Linux side are the same:

[Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than 120 seconds.
[Tue Aug 14 15:30:33 2018] Not tainted 3.16.0-4-amd64 #1
[Tue Aug 14 15:30:33 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Aug 14 15:30:33 2018] php5-fpm D ffff8801fea13200 0 14773 729 0x00000000
[Tue Aug 14 15:30:33 2018] ffff880100bbe0d0 0000000000000282 0000000000013200 ffff880129bcffd8
[Tue Aug 14 15:30:33 2018] 0000000000013200 ffff880100bbe0d0 ffff880153ed0d68 ffff880129bcfee0
[Tue Aug 14 15:30:33 2018] ffff880153ed0d6c ffff880100bbe0d0 00000000ffffffff ffff880153ed0d70
[Tue Aug 14 15:30:33 2018] Call Trace:
[Tue Aug 14 15:30:33 2018] [<ffffffff81522935>] ? schedule_preempt_disabled+0x25/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff815243d3>] ? __mutex_lock_slowpath+0xd3/0x1d0
[Tue Aug 14 15:30:33 2018] [<ffffffff811d7a93>] ? write_inode_now+0x93/0xc0
[Tue Aug 14 15:30:33 2018] [<ffffffff815244eb>] ? mutex_lock+0x1b/0x2a
[Tue Aug 14 15:30:33 2018] [<ffffffffa00df69f>] ? fuse_flush+0x8f/0x1e0 [fuse]
[Tue Aug 14 15:30:33 2018] [<ffffffff811af393>] ? vfs_read+0x93/0x170
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad14a>] ? filp_close+0x2a/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad1af>] ? SyS_close+0x1f/0x50
[Tue Aug 14 15:30:33 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15

and if I check this process it has state "D" which is "D = uninterruptible sleep".

Now I also managed to take a statedump file as recommended but I see in its content under the "[io-cache.inode]" a "path=" which I would need to remove as it contains filenames for privacy reasons. Can I remove every "path=" line and still send you the statedump file for analysis?

Thank you.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Nithya Balachandran
Thanks for letting us know. Sanoj, can you take a look at this?
Thanks.
Nithya
Post by mabi
As you mentioned after creating the /var/run/gluster directory I got a statedump file in there.
As a workaround I have now removed the quota for this specific directory and as it is a production server I can currently not "play" with it by adding the quota back and having the same problem as it requires me to reboot the server with downtime...
But I can confirm that by removing the quota from that directory, the problem is gone (no more blocking processes such as "ls") so there must be an issue or bug with the quota part of gluster.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
Then I ran the following command
sudo kill -USR1 456
but now I can't find where the files are stored. Are these supposed to be stored on the client directly? I checked /var/run/gluster and /var/log/gluster but could not see anything and /var/log/gluster does not even exist on the client.
They are usually created in /var/run/gluster. You will need to create the directory on the client if it does not exist.
Post by mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which command should I use for that? and which process would you like, the blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
- gluster volume info
- statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special?
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286 0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490 ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0 ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ? prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ? fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ? path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ? kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ? fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ? filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ? user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ? kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ? user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Raghavendra Gowdappa
2018-08-15 03:21:43 UTC
Permalink
Post by mabi
Bad news: the process blocked happened again this time with another
directory of another user which is NOT over his quota but which also has
quota enabled.
[Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than 120 seconds.
[Tue Aug 14 15:30:33 2018] Not tainted 3.16.0-4-amd64 #1
[Tue Aug 14 15:30:33 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Tue Aug 14 15:30:33 2018] php5-fpm D ffff8801fea13200 0 14773 729 0x00000000
[Tue Aug 14 15:30:33 2018] ffff880100bbe0d0 0000000000000282
0000000000013200 ffff880129bcffd8
[Tue Aug 14 15:30:33 2018] 0000000000013200 ffff880100bbe0d0
ffff880153ed0d68 ffff880129bcfee0
[Tue Aug 14 15:30:33 2018] ffff880153ed0d6c ffff880100bbe0d0
00000000ffffffff ffff880153ed0d70
[Tue Aug 14 15:30:33 2018] [<ffffffff81522935>] ?
schedule_preempt_disabled+0x25/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff815243d3>] ?
__mutex_lock_slowpath+0xd3/0x1d0
[Tue Aug 14 15:30:33 2018] [<ffffffff811d7a93>] ?
write_inode_now+0x93/0xc0
[Tue Aug 14 15:30:33 2018] [<ffffffff815244eb>] ? mutex_lock+0x1b/0x2a
[Tue Aug 14 15:30:33 2018] [<ffffffffa00df69f>] ? fuse_flush+0x8f/0x1e0 [fuse]
[Tue Aug 14 15:30:33 2018] [<ffffffff811af393>] ? vfs_read+0x93/0x170
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad14a>] ? filp_close+0x2a/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad1af>] ? SyS_close+0x1f/0x50
[Tue Aug 14 15:30:33 2018] [<ffffffff81525d0d>] ?
system_call_fast_compare_end+0x10/0x15
and if I check this process it has state "D" which is "D = uninterruptible sleep".
Now I also managed to take a statedump file as recommended but I see in
its content under the "[io-cache.inode]" a "path=" which I would need to
remove as it contains filenames for privacy reasons. Can I remove every
"path=" line and still send you the statedump file for analysis?
Yes. Removing path is fine and statedumps will still be useful for
debugging the issue.
Post by mabi
Thank you.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Thanks for letting us know. Sanoj, can you take a look at this?
Thanks.
Nithya
Post by mabi
As you mentioned after creating the /var/run/gluster directory I got a
statedump file in there.
As a workaround I have now removed the quota for this specific directory
and as it is a production server I can currently not "play" with it by
adding the quota back and having the same problem as it requires me to
reboot the server with downtime...
But I can confirm that by removing the quota from that directory, the
problem is gone (no more blocking processes such as "ls") so there must be
an issue or bug with the quota part of gluster.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Thanks for the documentation. On my client using FUSE mount I found the
root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs
--volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
Then I ran the following command
sudo kill -USR1 456
but now I can't find where the files are stored. Are these supposed to
be stored on the client directly? I checked /var/run/gluster and
/var/log/gluster but could not see anything and /var/log/gluster does not
even exist on the client.
They are usually created in /var/run/gluster. You will need to create the
directory on the client if it does not exist.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on
Linux. Which command should I use for that? and which process would you
like, the blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to
https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
1. gluster volume info
2. statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version
3.12.12 and now I see a weird behaviour on my client (using FUSE mount)
where I have processes (PHP 5.6 FPM) trying to access a specific directory
and then the process blocks. I can't kill the process either, not even with
kill -9. I need to reboot the machine in order to get rid of these blocked
processes.
This directory has one particularity compared to the other directories
it is that it has reached it's quota soft-limit as you can see here in the
Path Hard-limit Soft-limit
Used Available Soft-limit exceeded? Hard-limit exceeded?
------------------------------------------------------------
-------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB
9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint
where to start looking for... And by the way can someone explain me what
the soft-limit does? or does it not do anything special?
Here is an the linux stack of a blocking process on that directory
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0
2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286
0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490
ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0
ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ?
__fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ?
prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ?
fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ?
lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ?
path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ?
kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ?
fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ?
filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ?
user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ?
kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ?
user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ?
vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ?
SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ?
SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ?
system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-08-15 07:21:37 UTC
Permalink
Great, you will then find attached here the statedump of the client using the FUSE glusterfs mount right after two processes have blocked.

Two notes here regarding the "path=" in this statedump file:
- I have renamed all the "path=" which has the problematic directory as "path=PROBLEMATIC_DIRECTORY_HERE
- All the other "path=" I have renamed them to "path=REMOVED_FOR_PRIVACY".

Note also that funnily enough the number of "path=" for that problematic directory sums up to exactly 5000 entries. Coincidence or hint to the problem maybe?

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Bad news: the process blocked happened again this time with another directory of another user which is NOT over his quota but which also has quota enabled.
[Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than 120 seconds.
[Tue Aug 14 15:30:33 2018] Not tainted 3.16.0-4-amd64 #1
[Tue Aug 14 15:30:33 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Aug 14 15:30:33 2018] php5-fpm D ffff8801fea13200 0 14773 729 0x00000000
[Tue Aug 14 15:30:33 2018] ffff880100bbe0d0 0000000000000282 0000000000013200 ffff880129bcffd8
[Tue Aug 14 15:30:33 2018] 0000000000013200 ffff880100bbe0d0 ffff880153ed0d68 ffff880129bcfee0
[Tue Aug 14 15:30:33 2018] ffff880153ed0d6c ffff880100bbe0d0 00000000ffffffff ffff880153ed0d70
[Tue Aug 14 15:30:33 2018] [<ffffffff81522935>] ? schedule_preempt_disabled+0x25/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff815243d3>] ? __mutex_lock_slowpath+0xd3/0x1d0
[Tue Aug 14 15:30:33 2018] [<ffffffff811d7a93>] ? write_inode_now+0x93/0xc0
[Tue Aug 14 15:30:33 2018] [<ffffffff815244eb>] ? mutex_lock+0x1b/0x2a
[Tue Aug 14 15:30:33 2018] [<ffffffffa00df69f>] ? fuse_flush+0x8f/0x1e0 [fuse]
[Tue Aug 14 15:30:33 2018] [<ffffffff811af393>] ? vfs_read+0x93/0x170
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad14a>] ? filp_close+0x2a/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad1af>] ? SyS_close+0x1f/0x50
[Tue Aug 14 15:30:33 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
and if I check this process it has state "D" which is "D = uninterruptible sleep".
Now I also managed to take a statedump file as recommended but I see in its content under the "[io-cache.inode]" a "path=" which I would need to remove as it contains filenames for privacy reasons. Can I remove every "path=" line and still send you the statedump file for analysis?
Yes. Removing path is fine and statedumps will still be useful for debugging the issue.
Post by mabi
Thank you.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Nithya Balachandran
Thanks for letting us know. Sanoj, can you take a look at this?
Thanks.
Nithya
Post by mabi
As you mentioned after creating the /var/run/gluster directory I got a statedump file in there.
As a workaround I have now removed the quota for this specific directory and as it is a production server I can currently not "play" with it by adding the quota back and having the same problem as it requires me to reboot the server with downtime...
But I can confirm that by removing the quota from that directory, the problem is gone (no more blocking processes such as "ls") so there must be an issue or bug with the quota part of gluster.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
Then I ran the following command
sudo kill -USR1 456
but now I can't find where the files are stored. Are these supposed to be stored on the client directly? I checked /var/run/gluster and /var/log/gluster but could not see anything and /var/log/gluster does not even exist on the client.
They are usually created in /var/run/gluster. You will need to create the directory on the client if it does not exist.
Post by mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which command should I use for that? and which process would you like, the blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
- gluster volume info
- statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special?
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286 0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490 ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0 ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ? prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ? fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ? path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ? kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ? fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ? filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ? user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ? kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ? user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
mabi
2018-09-02 14:49:49 UTC
Permalink
Hello,

I wanted to report that I had this morning a similar issue on another server where a few PHP-FPM processes get blocked on different GlusterFS volume mounted through a FUSE mount. This GlusterFS volume has no quota enabled so it might not be quota related after all.

Here would be the Linux kernel stack trace:

[Sun Sep 2 06:47:47 2018] INFO: task php5-fpm:25880 blocked for more than 120 seconds.
[Sun Sep 2 06:47:47 2018] Not tainted 3.16.0-4-amd64 #1
[Sun Sep 2 06:47:47 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Sep 2 06:47:47 2018] php5-fpm D ffff88017ee12f40 0 25880 1 0x00000004
[Sun Sep 2 06:47:47 2018] ffff880101688b60 0000000000000282 0000000000012f40 ffff880059ca3fd8
[Sun Sep 2 06:47:47 2018] 0000000000012f40 ffff880101688b60 ffff8801093b51b0 ffff8801067ec800
[Sun Sep 2 06:47:47 2018] ffff880059ca3cc0 ffff8801093b5290 ffff8801093b51b0 ffff880059ca3e80
[Sun Sep 2 06:47:47 2018] Call Trace:
[Sun Sep 2 06:47:47 2018] [<ffffffffa00cd50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Sun Sep 2 06:47:47 2018] [<ffffffff810a9610>] ? prepare_to_wait_event+0xf0/0xf0
[Sun Sep 2 06:47:47 2018] [<ffffffffa00d6990>] ? fuse_send_write+0xd0/0x100 [fuse]
[Sun Sep 2 06:47:47 2018] [<ffffffffa00d82af>] ? fuse_perform_write+0x26f/0x4b0 [fuse]
[Sun Sep 2 06:47:47 2018] [<ffffffffa00d86cd>] ? fuse_file_write_iter+0x1dd/0x2b0 [fuse]
[Sun Sep 2 06:47:47 2018] [<ffffffff811aa7f4>] ? new_sync_write+0x74/0xa0
[Sun Sep 2 06:47:47 2018] [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0
[Sun Sep 2 06:47:47 2018] [<ffffffff811aadfd>] ? vfs_read+0xed/0x170
[Sun Sep 2 06:47:47 2018] [<ffffffff811aba72>] ? SyS_write+0x42/0xa0
[Sun Sep 2 06:47:47 2018] [<ffffffff811aaa8e>] ? SyS_lseek+0x7e/0xa0
[Sun Sep 2 06:47:47 2018] [<ffffffff8151a48d>] ? system_call_fast_compare_end+0x10/0x15

Did anyone already have time to have a look at the statedump file I sent around 3 weeks ago?

I never saw this type of problems in the past and it started to appear since I upgraded to GluterFS 3.12.12.

Best regards,
Mabi

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Great, you will then find attached here the statedump of the client using the FUSE glusterfs mount right after two processes have blocked.
- I have renamed all the "path=" which has the problematic directory as "path=PROBLEMATIC_DIRECTORY_HERE
- All the other "path=" I have renamed them to "path=REMOVED_FOR_PRIVACY".
Note also that funnily enough the number of "path=" for that problematic directory sums up to exactly 5000 entries. Coincidence or hint to the problem maybe?
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Bad news: the process blocked happened again this time with another directory of another user which is NOT over his quota but which also has quota enabled.
[Tue Aug 14 15:30:33 2018] INFO: task php5-fpm:14773 blocked for more than 120 seconds.
[Tue Aug 14 15:30:33 2018] Not tainted 3.16.0-4-amd64 #1
[Tue Aug 14 15:30:33 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Aug 14 15:30:33 2018] php5-fpm D ffff8801fea13200 0 14773 729 0x00000000
[Tue Aug 14 15:30:33 2018] ffff880100bbe0d0 0000000000000282 0000000000013200 ffff880129bcffd8
[Tue Aug 14 15:30:33 2018] 0000000000013200 ffff880100bbe0d0 ffff880153ed0d68 ffff880129bcfee0
[Tue Aug 14 15:30:33 2018] ffff880153ed0d6c ffff880100bbe0d0 00000000ffffffff ffff880153ed0d70
[Tue Aug 14 15:30:33 2018] [<ffffffff81522935>] ? schedule_preempt_disabled+0x25/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff815243d3>] ? __mutex_lock_slowpath+0xd3/0x1d0
[Tue Aug 14 15:30:33 2018] [<ffffffff811d7a93>] ? write_inode_now+0x93/0xc0
[Tue Aug 14 15:30:33 2018] [<ffffffff815244eb>] ? mutex_lock+0x1b/0x2a
[Tue Aug 14 15:30:33 2018] [<ffffffffa00df69f>] ? fuse_flush+0x8f/0x1e0 [fuse]
[Tue Aug 14 15:30:33 2018] [<ffffffff811af393>] ? vfs_read+0x93/0x170
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad14a>] ? filp_close+0x2a/0x70
[Tue Aug 14 15:30:33 2018] [<ffffffff811ad1af>] ? SyS_close+0x1f/0x50
[Tue Aug 14 15:30:33 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
and if I check this process it has state "D" which is "D = uninterruptible sleep".
Now I also managed to take a statedump file as recommended but I see in its content under the "[io-cache.inode]" a "path=" which I would need to remove as it contains filenames for privacy reasons. Can I remove every "path=" line and still send you the statedump file for analysis?
Yes. Removing path is fine and statedumps will still be useful for debugging the issue.
Post by mabi
Thank you.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Nithya Balachandran
Thanks for letting us know. Sanoj, can you take a look at this?
Thanks.
Nithya
Post by mabi
As you mentioned after creating the /var/run/gluster directory I got a statedump file in there.
As a workaround I have now removed the quota for this specific directory and as it is a production server I can currently not "play" with it by adding the quota back and having the same problem as it requires me to reboot the server with downtime...
But I can confirm that by removing the quota from that directory, the problem is gone (no more blocking processes such as "ls") so there must be an issue or bug with the quota part of gluster.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
root 456 1 4 14:17 ? 00:05:15 /usr/sbin/glusterfs --volfile-server=gfs1a --volfile-id=myvol-private /mnt/myvol-private
Then I ran the following command
sudo kill -USR1 456
but now I can't find where the files are stored. Are these supposed to be stored on the client directly? I checked /var/run/gluster and /var/log/gluster but could not see anything and /var/log/gluster does not even exist on the client.
They are usually created in /var/run/gluster. You will need to create the directory on the client if it does not exist.
Post by mabi
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by mabi
Hi Nithya,
1. gluster volume info
Volume Name: myvol-private
Type: Replicate
Volume ID: e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: gfs1a:/data/myvol-private/brick
Brick2: gfs1b:/data/myvol-private/brick
Brick3: gfs1c:/srv/glusterfs/myvol-private/brick (arbiter)
features.default-soft-limit: 95%
transport.address-family: inet
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 4
server.event-threads: 4
auth.allow: 192.168.100.92
2. Sorry I have no clue how to take a "statedump" of a process on Linux. Which command should I use for that? and which process would you like, the blocked process (for example "ls")?
Statedumps are gluster specific. Please refer to https://docs.gluster.org/en/v3/Troubleshooting/statedump/ for instructions.
Post by mabi
Regards,
M.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Hi,
- gluster volume info
- statedump of the fuse process when it hangs
Thanks,
Nithya
Post by mabi
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version 3.12.12 and now I see a weird behaviour on my client (using FUSE mount) where I have processes (PHP 5.6 FPM) trying to access a specific directory and then the process blocks. I can't kill the process either, not even with kill -9. I need to reboot the machine in order to get rid of these blocked processes.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory 100.0GB 80%(80.0GB) 90.5GB 9.5GB Yes No
That does not mean that it is the quota's fault but it might be a hint where to start looking for... And by the way can someone explain me what the soft-limit does? or does it not do anything special?
[Thu Aug 9 14:21:07 2018] INFO: task ls:2272 blocked for more than 120 seconds.
[Thu Aug 9 14:21:07 2018] Not tainted 3.16.0-4-amd64 #1
[Thu Aug 9 14:21:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 9 14:21:07 2018] ls D ffff88017ef93200 0 2272 2268 0x00000004
[Thu Aug 9 14:21:07 2018] ffff88017653f490 0000000000000286 0000000000013200 ffff880174d7bfd8
[Thu Aug 9 14:21:07 2018] 0000000000013200 ffff88017653f490 ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug 9 14:21:07 2018] ffff880174d7bbe0 ffff8800eeb3d6d0 ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dc50d>] ? __fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff810abce0>] ? prepare_to_wait_event+0xf0/0xf0
[Thu Aug 9 14:21:07 2018] [<ffffffffa00e0791>] ? fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811b944e>] ? lookup_fast+0x25e/0x2b0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bacc5>] ? path_lookupat+0x155/0x780
[Thu Aug 9 14:21:07 2018] [<ffffffff81195715>] ? kmem_cache_alloc+0x75/0x480
[Thu Aug 9 14:21:07 2018] [<ffffffffa00dfca9>] ? fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug 9 14:21:07 2018] [<ffffffff811bb316>] ? filename_lookup+0x26/0xc0
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf594>] ? user_path_at_empty+0x54/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff81193e08>] ? kmem_cache_free+0xd8/0x210
[Thu Aug 9 14:21:07 2018] [<ffffffff811bf59f>] ? user_path_at_empty+0x5f/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b3d46>] ? vfs_fstatat+0x46/0x90
[Thu Aug 9 14:21:07 2018] [<ffffffff811b421d>] ? SYSC_newlstat+0x1d/0x40
[Thu Aug 9 14:21:07 2018] [<ffffffff811d34b8>] ? SyS_lgetxattr+0x58/0x80
[Thu Aug 9 14:21:07 2018] [<ffffffff81525d0d>] ? system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Alvin Starr
2018-08-09 13:30:51 UTC
Permalink
A while back we found an issue with PHP around file locks.

https://bugs.php.net/bug.php?id=53076

This would leave hanging processes that we could only kill by forcibly
remounting the filesystems.

Its not really a PHP bug but they are not willing to fix the code to
deal with Glusters idiosyncrasies.
Post by Nithya Balachandran
Hi,
1. gluster volume info
2. statedump of the fuse process when it hangs
Thanks,
Nithya
Hello,
I recently upgraded my GlusterFS replica 2+1 (aribter) to version
3.12.12 and now I see a weird behaviour on my client (using FUSE
mount) where I have processes (PHP 5.6 FPM) trying to access a
specific directory and then the process blocks. I can't kill the
process either, not even with kill -9. I need to reboot the
machine in order to get rid of these blocked processes.
This directory has one particularity compared to the other
directories it is that it has reached it's quota soft-limit as you
                  Path                   Hard-limit Soft-limit   
  Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/directory                          100.0GB     80%(80.0GB)
 90.5GB   9.5GB             Yes                   No
That does not mean that it is the quota's fault but it might be a
hint where to start looking for... And by the way can someone
explain me what the soft-limit does? or does it not do anything
special?
Here is an the linux stack of a blocking process on that directory
[Thu Aug  9 14:21:07 2018] INFO: task ls:2272 blocked for more
than 120 seconds.
[Thu Aug  9 14:21:07 2018]       Not tainted 3.16.0-4-amd64 #1
[Thu Aug  9 14:21:07 2018] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug  9 14:21:07 2018] ls              D ffff88017ef93200   
 0  2272   2268 0x00000004
[Thu Aug  9 14:21:07 2018]  ffff88017653f490 0000000000000286
0000000000013200 ffff880174d7bfd8
[Thu Aug  9 14:21:07 2018]  0000000000013200 ffff88017653f490
ffff8800eeb3d5f0 ffff8800fefac800
[Thu Aug  9 14:21:07 2018]  ffff880174d7bbe0 ffff8800eeb3d6d0
ffff8800fefac800 ffff8800ffe1e1c0
[Thu Aug  9 14:21:07 2018]  [<ffffffffa00dc50d>] ?
__fuse_request_send+0xbd/0x270 [fuse]
[Thu Aug  9 14:21:07 2018]  [<ffffffff810abce0>] ?
prepare_to_wait_event+0xf0/0xf0
[Thu Aug  9 14:21:07 2018]  [<ffffffffa00e0791>] ?
fuse_dentry_revalidate+0x181/0x300 [fuse]
[Thu Aug  9 14:21:07 2018]  [<ffffffff811b944e>] ?
lookup_fast+0x25e/0x2b0
[Thu Aug  9 14:21:07 2018]  [<ffffffff811bacc5>] ?
path_lookupat+0x155/0x780
[Thu Aug  9 14:21:07 2018]  [<ffffffff81195715>] ?
kmem_cache_alloc+0x75/0x480
[Thu Aug  9 14:21:07 2018]  [<ffffffffa00dfca9>] ?
fuse_getxattr+0xe9/0x150 [fuse]
[Thu Aug  9 14:21:07 2018]  [<ffffffff811bb316>] ?
filename_lookup+0x26/0xc0
[Thu Aug  9 14:21:07 2018]  [<ffffffff811bf594>] ?
user_path_at_empty+0x54/0x90
[Thu Aug  9 14:21:07 2018]  [<ffffffff81193e08>] ?
kmem_cache_free+0xd8/0x210
[Thu Aug  9 14:21:07 2018]  [<ffffffff811bf59f>] ?
user_path_at_empty+0x5f/0x90
[Thu Aug  9 14:21:07 2018]  [<ffffffff811b3d46>] ?
vfs_fstatat+0x46/0x90
[Thu Aug  9 14:21:07 2018]  [<ffffffff811b421d>] ?
SYSC_newlstat+0x1d/0x40
[Thu Aug  9 14:21:07 2018]  [<ffffffff811d34b8>] ?
SyS_lgetxattr+0x58/0x80
[Thu Aug  9 14:21:07 2018]  [<ffffffff81525d0d>] ?
system_call_fast_compare_end+0x10/0x15
My 3 gluster nodes are all Debian 9 and my client Debian 8.
Let me know if you need more information.
Best regards,
Mabi
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Alvin Starr || land: (905)513-7688
Netvel Inc. || Cell: (416)806-0133
***@netvel.net ||
Loading...