[Gluster-users] gluster volume heal info hang on 3.10.6 & 3.12.11

Ning Adam

2018-09-14 15:44:21 UTC

I have made some progress:

Using gluster-3.12.1, "glfsheal vol0" is fine. It tells me some file
need to be heal

Brick storage0-cluster0:/data/brk02/src
<gfid:88dfb19a-92dd-4abd-bb15-4931e6e47cd9>
<gfid:88dfb19a-92dd-4abd-bb15-4931e6e47cd9>/0001
Status: Connected
Number of entries: 2

The hanging problem was introduced in

commit 94faf8c4a1f5a020593ca65f44d68e1ba2f632eb
Author: karthik-us <***@redhat.com>
Date: Wed Sep 20 18:30:21 2017 +0530

cluster/afr: Sending subvol up/down events when subvol comes
up or goes down

BUG: 1493539

(cherry picked from commit 3bbb4fe4b33dc3a3ed068ed2284077f2a4d8265a)

Change-Id: I6580351b245d5f868e9ddc6a4eb4dd6afa3bb6ec
BUG: 1492066
Signed-off-by: karthik-us <***@redhat.com>

But while I using HEAD of relase-3.12 branch and do "git revert
94faf8c4", "glfsheal vol0" still hangs,
so this is not a simple problem.

I found the bug related to "client.event-threads", because previously
I have the setting "client.event-threads: 6",
there was only latency log about vol0-client 0-5 and

[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping
latency is 29998ms

No ping latency to other brick in the log.

After I do "gluster volume set vol0 client.event-threads: 24" and
install 3.12.12 (since I am debuging on 8x3 cluster),
running "glfsheal vol0" finished in 30seconds (the time is still acceptable).
And there was

[2018-09-14 21:10:17.817764] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-23: Ping latency
is 29876ms

So it leads to a conclusion that connections to brick might be holding
lock in client event-thread.

My gluster volume info:

Options Reconfigured:
cluster.locking-scheme: full
performance.open-behind: on
cluster.lookup-unhashed: auto
transport.address-family: inet
cluster.lookup-optimize: on
diagnostics.brick-log-level: TRACE
performance.readdir-ahead: True
performance.md-cache-timeout: 0
cluster.server-quorum-type: none
performance.client-io-threads: True
performance.read-ahead: False
features.cache-invalidation: False
network.ping-timeout: 120
cluster.quorum-type: auto
nfs.disable: True
performance.io-cache: False
server.event-threads: 4
cluster.rebal-throttle: lazy
transport.keepalive: True
performance.write-behind: True
cluster.self-heal-daemon: enable
client.event-threads: 6
cluster.eager-lock: on
diagnostics.client-log-level: INFO
features.trash: off
features.selinux: off

(My last mail seemed to be dropped by the list, reposting in plain-text format)
I have recently done a rolling upgraded on some of my 3.7.6 gluster
1) IO error occationally
2) "gluster volume heal vol0 info" hangs after decent amount of
files written to the cluster (vary from hours to days)
All of my 7 upgraded 3.10 clusters reproduced these problems, and
then I tried upgraded them to 3.12.11, but nothing improved.
I have submited a bug on
https://bugzilla.redhat.com/show_bug.cgi?id=1618932 , so far no one
seams to look into it.
The problem of heal hanging is most annoying because it makes me
impossible to monitor my clusters. So I tried to look into the code
and analyze with the DEBUG & TRACE log, found the following situation.
1) I have "network.ping-timeout: 120" in volume option . If I modify
network.ping-timeout to 0,
"gluster volume heal vol0 info" will finish eventually after 10~20
minutes (8x3 bricks setup). And it says no file to heal.
It takes even longer when cluster is larger (36x3 bricks setup).
It does not matter even I stop all my IO or restart all glusterd &
glusterfsd process. heal info still hangs.
"gluster volume status" is all normal. All my brick servers are in
low latency LAN enviroment.
2) As one of my clusters having this problem is testing setup,
currently I may do some debug there.
I looked into heal/src/glfs-heal.c, compile the code and added some log,
the initialize call on "ret = glfs_init (fs)" hangs.
I traced to " ret = glfs_chdir (fs, "/");" and then "dht_lookup(
path='/' )", it never get to "dht_readdirp_cbk()"
I stopped all client IO operation on my gluster volume and then ran
"glfsheal vol0".
I looked into /var/log/gluster/glfshead_vol0.log, there is messive
[2018-09-14 05:56:09.395869] T [rpc-clnt.c:407:rpc_clnt_reconnect]
0-vol0-client-12: attempting reconnect
[2018-09-14 05:56:09.396160] T [socket.c:3125:socket_connect] (-->
/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f10d4e79778] (-->
/usr/lib/glus
terfs//rpc-transport/socket.so(+0xd442)[0x7f10c647c442] (-->
/lib/libgfrpc.so.0(rpc_transport_connect+0x93)[0x7f10d437de43] (-->
/lib/libgfrpc.so.0(r
pc_clnt_reconnect+0x136)[0x7f10d438121b] (-->
/lib/libglusterfs.so.0(+0x3e4e9)[0x7f10d4e894e9] )))))
0-vol0-client-12: connect () called on transport
already connected
[2018-09-14 05:56:09.396194] T [rpc-clnt.c:407:rpc_clnt_reconnect]
0-vol0-client-10: attempting reconnect
[2018-09-14 05:56:09.396424] T [socket.c:3125:socket_connect] (-->
/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f10d4e79778] (-->
/usr/lib/glus
terfs//rpc-transport/socket.so(+0xd442)[0x7f10c647c442] (-->
/lib/libgfrpc.so.0(rpc_transport_connect+0x93)[0x7f10d437de43] (-->
/lib/libgfrpc.so.0(r
pc_clnt_reconnect+0x136)[0x7f10d438121b] (-->
/lib/libglusterfs.so.0(+0x3e4e9)[0x7f10d4e894e9] )))))
0-vol0-client-10: connect () called on transport
already connected
It's also weird that 0-vol0-client-5 was having huge ammount RPC ping latency
(29 seconds it's not possible because simple ICMP pinging that server
is less then 0.1 second)
/var/log/glusterfs/glfsheal-vol0.log | grep laten
[2018-09-14 05:13:16.008866] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
is 0ms
[2018-09-14 05:13:16.016669] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
is 0ms
[2018-09-14 05:13:16.021769] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
is 0ms
[2018-09-14 05:13:16.033761] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
is 4ms
[2018-09-14 05:13:16.101962] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
is 62ms
[2018-09-14 05:13:46.105502] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping latency
is 29996ms
[2018-09-14 05:20:13.895285] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
is 0ms
[2018-09-14 05:20:13.901476] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
is 0ms
[2018-09-14 05:20:13.908373] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
is 0ms
[2018-09-14 05:20:13.918782] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
is 4ms
[2018-09-14 05:20:13.987162] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
is 62ms
[2018-09-14 05:20:43.992528] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping latency
is 29998ms
[2018-09-14 05:25:11.983259] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
is 0ms
[2018-09-14 05:25:11.991199] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
is 0ms
[2018-09-14 05:25:11.996933] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
is 0ms
[2018-09-14 05:25:12.007867] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
is 4ms
[2018-09-14 05:25:12.076057] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
is 62ms
I checked vol0-client-5, nothing special comparing to other brick. CPU
is not high, disk has no problem.
No warning or error log in /var/log/gluster/brick/data-brk01-src.log,
After I turn on TRACE level log on brick,
the logging is too much to analyze, and no apparent difference
against other brick.
I hope someone can have some idea about this, or pointing the
direction to look into may help.
Thanks very much