Dmitry Melekhov
2018-08-28 05:54:52 UTC
Hello!
Yesterday we hit something like this on 4.1.2
Centos 7.5.
Volume is replicated - two bricks and one arbiter.
We rebooted arbiter, waited for heal end, and tried to live migrate VM
to another node ( we run VMs on gluster nodes ):
[2018-08-27 09:56:22.085411] I [MSGID: 115029]
[server-handshake.c:763:server_setvolume] 0-pool-server: accepted client
from
CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0 (version: 4.1.2)
[2018-08-27 09:56:22.107609] I [MSGID: 115036]
[server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
from
CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0
[2018-08-27 09:56:22.107747] I [MSGID: 101055]
[client_t.c:444:gf_client_unref] 0-pool-server: Shutting down connection
CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
t-6-RECON_NO:-0
[2018-08-27 09:58:37.905829] I [MSGID: 115036]
[server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
from
CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
ool-client-6-RECON_NO:-0
[2018-08-27 09:58:37.905926] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28c831d8bc550000}
[2018-08-27 09:58:37.905959] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2870a7d6bc550000}
[2018-08-27 09:58:37.905979] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880a7d6bc550000}
[2018-08-27 09:58:37.905997] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f031d8bc550000}
[2018-08-27 09:58:37.906016] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b07dd5bc550000}
[2018-08-27 09:58:37.906034] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28e0a7d6bc550000}
[2018-08-27 09:58:37.906056] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b845d8bc550000}
[2018-08-27 09:58:37.906079] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2858a7d8bc550000}
[2018-08-27 09:58:37.906098] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2868a8d7bc550000}
[2018-08-27 09:58:37.906121] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f80bd7bc550000}
...
[2018-08-27 09:58:37.907375] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28a8cdd6bc550000}
[2018-08-27 09:58:37.907393] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880cdd6bc550000}
[2018-08-27 09:58:37.907476] I [socket.c:3837:socket_submit_reply]
0-tcp.pool-server: not connected (priv->connected = -1)
[2018-08-27 09:58:37.907520] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88cb, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.910727] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910814] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88ce, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.910861] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910904] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88cf, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.910940] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910979] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88d1, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.911012] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.911050] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88d8, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.911083] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.916217] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.916520] I [MSGID: 115013]
[server-helpers.c:286:do_fd_cleanup] 0-pool-server: fd cleanup on
/balamak.img
after this I/O on /balamak.img was blocked.
Only solution we found was to reboot all 3 nodes.
Is there any bug report in bugzilla we can add logs?
Is it possible to turn of these locks?
Thank you!
Yesterday we hit something like this on 4.1.2
Centos 7.5.
Volume is replicated - two bricks and one arbiter.
We rebooted arbiter, waited for heal end, and tried to live migrate VM
to another node ( we run VMs on gluster nodes ):
[2018-08-27 09:56:22.085411] I [MSGID: 115029]
[server-handshake.c:763:server_setvolume] 0-pool-server: accepted client
from
CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0 (version: 4.1.2)
[2018-08-27 09:56:22.107609] I [MSGID: 115036]
[server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
from
CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0
[2018-08-27 09:56:22.107747] I [MSGID: 101055]
[client_t.c:444:gf_client_unref] 0-pool-server: Shutting down connection
CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
t-6-RECON_NO:-0
[2018-08-27 09:58:37.905829] I [MSGID: 115036]
[server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
from
CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
ool-client-6-RECON_NO:-0
[2018-08-27 09:58:37.905926] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28c831d8bc550000}
[2018-08-27 09:58:37.905959] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2870a7d6bc550000}
[2018-08-27 09:58:37.905979] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880a7d6bc550000}
[2018-08-27 09:58:37.905997] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f031d8bc550000}
[2018-08-27 09:58:37.906016] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b07dd5bc550000}
[2018-08-27 09:58:37.906034] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28e0a7d6bc550000}
[2018-08-27 09:58:37.906056] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b845d8bc550000}
[2018-08-27 09:58:37.906079] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2858a7d8bc550000}
[2018-08-27 09:58:37.906098] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2868a8d7bc550000}
[2018-08-27 09:58:37.906121] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f80bd7bc550000}
...
[2018-08-27 09:58:37.907375] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28a8cdd6bc550000}
[2018-08-27 09:58:37.907393] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880cdd6bc550000}
[2018-08-27 09:58:37.907476] I [socket.c:3837:socket_submit_reply]
0-tcp.pool-server: not connected (priv->connected = -1)
[2018-08-27 09:58:37.907520] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88cb, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.910727] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910814] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88ce, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.910861] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910904] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88cf, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.910940] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910979] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88d1, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.911012] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.911050] E [rpcsvc.c:1378:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xcb88d8, Program:
GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport
(tcp.pool-server)
[2018-08-27 09:58:37.911083] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.916217] E [server.c:137:server_submit_reply]
(-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
[0x7ffb64379084]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba]
-->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
[0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.916520] I [MSGID: 115013]
[server-helpers.c:286:do_fd_cleanup] 0-pool-server: fd cleanup on
/balamak.img
after this I/O on /balamak.img was blocked.
Only solution we found was to reboot all 3 nodes.
Is there any bug report in bugzilla we can add logs?
Is it possible to turn of these locks?
Thank you!