[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

Discussion:

TomK

2018-04-09 06:02:20 UTC

Hey All,

In a two node glusterfs setup, with one node down, can't use the second
node to mount the volume. I understand this is expected behaviour?
Anyway to allow the secondary node to function then replicate what
changed to the first (primary) when it's back online? Or should I just
go for a third node to allow for this?

Also, how safe is it to set the following to none?

cluster.quorum-type: auto
cluster.server-quorum-type: server

[***@nfs01 /]# gluster volume start gv01
volume start: gv01: failed: Quorum not met. Volume operation not allowed.
[***@nfs01 /]#

[***@nfs01 /]# gluster volume status
Status of volume: gv01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick nfs01:/bricks/0/gv01 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y
25561

Task Status of Volume gv01
------------------------------------------------------------------------------
There are no active volume tasks

[***@nfs01 /]#

[***@nfs01 /]# gluster volume info

Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
nfs.trusted-sync: on
performance.cache-size: 1GB
performance.io-thread-count: 16
performance.write-behind-window-size: 8MB
performance.readdir-ahead: on
client.event-threads: 8
server.event-threads: 8
cluster.quorum-type: auto
cluster.server-quorum-type: server
[***@nfs01 /]#

==> n.log <==
[2018-04-09 05:08:13.704156] I [MSGID: 100030] [glusterfsd.c:2556:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.13.2 (args: /usr/sbin/glusterfs --process-name fuse
--volfile-server=nfs01 --volfile-id=/gv01 /n)
[2018-04-09 05:08:13.711255] W [MSGID: 101002]
[options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is
deprecated, preferred is 'transport.address-family', continuing with
correction
[2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-04-09 05:08:13.729025] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2018-04-09 05:08:13.737757] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2018-04-09 05:08:13.738114] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3
[2018-04-09 05:08:13.738203] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2018-04-09 05:08:13.738324] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 5
[2018-04-09 05:08:13.738330] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 6
[2018-04-09 05:08:13.738655] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 7
[2018-04-09 05:08:13.738742] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 8
[2018-04-09 05:08:13.739460] W [MSGID: 101174]
[graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: option
'parallel-readdir' is not recognized
[2018-04-09 05:08:13.739787] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-0: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:13.747372] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-1: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747883] E [MSGID: 114058]
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2018-04-09 05:08:13.748026] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from
gv01-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-09 05:08:13.748070] W [MSGID: 108001]
[afr-common.c:5391:afr_notify] 0-gv01-replicate-0: Client-quorum is not met
[2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
Final graph:
+------------------------------------------------------------------------------+
1: volume gv01-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host nfs01
5: option remote-subvolume /bricks/0/gv01
6: option transport-type socket
7: option transport.address-family inet
8: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
9: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
10: option event-threads 8
11: option transport.tcp-user-timeout 0
12: option transport.socket.keepalive-time 20
13: option transport.socket.keepalive-interval 2
14: option transport.socket.keepalive-count 9
15: option send-gids true
16: end-volume
17:
18: volume gv01-client-1
19: type protocol/client
20: option ping-timeout 42
21: option remote-host nfs02
22: option remote-subvolume /bricks/0/gv01
23: option transport-type socket
24: option transport.address-family inet
25: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
26: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
27: option event-threads 8
28: option transport.tcp-user-timeout 0
29: option transport.socket.keepalive-time 20
30: option transport.socket.keepalive-interval 2
31: option transport.socket.keepalive-count 9
32: option send-gids true
33: end-volume
34:
35: volume gv01-replicate-0
36: type cluster/replicate
37: option afr-pending-xattr gv01-client-0,gv01-client-1
38: option quorum-type auto
39: option use-compound-fops off
40: subvolumes gv01-client-0 gv01-client-1
41: end-volume
42:
43: volume gv01-dht
44: type cluster/distribute
45: option lock-migration off
46: subvolumes gv01-replicate-0
47: end-volume
48:
49: volume gv01-write-behind
50: type performance/write-behind
51: option cache-size 8MB
52: subvolumes gv01-dht
53: end-volume
54:
55: volume gv01-read-ahead
56: type performance/read-ahead
57: subvolumes gv01-write-behind
58: end-volume
59:
60: volume gv01-readdir-ahead
61: type performance/readdir-ahead
62: option parallel-readdir off
63: option rda-request-size 131072
64: option rda-cache-limit 10MB
65: subvolumes gv01-read-ahead
66: end-volume
67:
68: volume gv01-io-cache
69: type performance/io-cache
70: option cache-size 1GB
71: subvolumes gv01-readdir-ahead
72: end-volume
73:
74: volume gv01-quick-read
75: type performance/quick-read
76: option cache-size 1GB
77: subvolumes gv01-io-cache
78: end-volume
79:
80: volume gv01-open-behind
81: type performance/open-behind
82: subvolumes gv01-quick-read
83: end-volume
84:
85: volume gv01-md-cache
86: type performance/md-cache
87: subvolumes gv01-open-behind
88: end-volume
89:
90: volume gv01
91: type debug/io-stats
92: option log-level INFO
93: option latency-measurement off
94: option count-fop-hits off
95: subvolumes gv01-md-cache
96: end-volume
97:
98: volume meta-autoload
99: type meta
100: subvolumes gv01
101: end-volume
102:
+------------------------------------------------------------------------------+
[2018-04-09 05:08:13.922631] E [socket.c:2374:socket_connect_finish]
0-gv01-client-1: connection to 192.168.0.119:24007 failed (No route to
host); disconnecting socket
[2018-04-09 05:08:13.922690] E [MSGID: 108006]
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0:
All subvolumes are down. Going offline until atleast one of them comes
back up.
[2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24
kernel 7.22
[2018-04-09 05:08:13.926245] I [fuse-bridge.c:4835:fuse_graph_sync]
0-fuse: switched to graph 0
[2018-04-09 05:08:13.926518] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.926671] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.926762] E [fuse-bridge.c:4271:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2018-04-09 05:08:13.927207] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.927262] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.927301] W
[fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse:
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.927339] E [fuse-bridge.c:900:fuse_getattr_resume]
0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.931497] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.931558] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.931599] W
[fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse:
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.931623] E [fuse-bridge.c:900:fuse_getattr_resume]
0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.937258] I [fuse-bridge.c:5093:fuse_thread_proc]
0-fuse: initating unmount of /n
[2018-04-09 05:08:13.938043] W [glusterfsd.c:1393:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560b52471675]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560b5247149b] ) 0-:
received signum (15), shutting down
[2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini] 0-fuse:
Unmounting '/n'.
[2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] 0-fuse: Closing
fuse connection to '/n'.

==> glusterd.log <==
[2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect]
0-management: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"

==> glustershd.log <==
[2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"

--
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.

Alex K

2018-04-09 06:45:20 UTC

Permalink

Hi,

You need 3 nodes at least to have quorum enabled. In 2 node setup you need
to disable quorum so as to be able to still use the volume when one of the
nodes go down.

Post by TomK
Hey All,
In a two node glusterfs setup, with one node down, can't use the second
node to mount the volume. I understand this is expected behaviour?
Anyway to allow the secondary node to function then replicate what
changed to the first (primary) when it's back online? Or should I just
go for a third node to allow for this?
Also, how safe is it to set the following to none?
cluster.quorum-type: auto
cluster.server-quorum-type: server
volume start: gv01: failed: Quorum not met. Volume operation not allowed.
Status of volume: gv01
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick nfs01:/bricks/0/gv01 N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
25561
Task Status of Volume gv01
------------------------------------------------------------------------------
There are no active volume tasks
Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
nfs.trusted-sync: on
performance.cache-size: 1GB
performance.io-thread-count: 16
performance.write-behind-window-size: 8MB
performance.readdir-ahead: on
client.event-threads: 8
server.event-threads: 8
cluster.quorum-type: auto
cluster.server-quorum-type: server
==> n.log <==
[2018-04-09 05:08:13.704156] I [MSGID: 100030] [glusterfsd.c:2556:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.13.2 (args: /usr/sbin/glusterfs --process-name fuse
--volfile-server=nfs01 --volfile-id=/gv01 /n)
[2018-04-09 05:08:13.711255] W [MSGID: 101002]
[options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is
deprecated, preferred is 'transport.address-family', continuing with
correction
[2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-04-09 05:08:13.729025] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2018-04-09 05:08:13.737757] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2018-04-09 05:08:13.738114] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3
[2018-04-09 05:08:13.738203] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2018-04-09 05:08:13.738324] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 5
[2018-04-09 05:08:13.738330] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 6
[2018-04-09 05:08:13.738655] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 7
[2018-04-09 05:08:13.738742] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 8
[2018-04-09 05:08:13.739460] W [MSGID: 101174]
[graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: option
'parallel-readdir' is not recognized
[2018-04-09 05:08:13.739787] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-0: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:13.747372] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-1: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747883] E [MSGID: 114058]
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2018-04-09 05:08:13.748026] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from
gv01-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-09 05:08:13.748070] W [MSGID: 108001]
[afr-common.c:5391:afr_notify] 0-gv01-replicate-0: Client-quorum is not met
[2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
+------------------------------------------------------------------------------+
1: volume gv01-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host nfs01
5: option remote-subvolume /bricks/0/gv01
6: option transport-type socket
7: option transport.address-family inet
8: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
9: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
10: option event-threads 8
11: option transport.tcp-user-timeout 0
12: option transport.socket.keepalive-time 20
13: option transport.socket.keepalive-interval 2
14: option transport.socket.keepalive-count 9
15: option send-gids true
16: end-volume
18: volume gv01-client-1
19: type protocol/client
20: option ping-timeout 42
21: option remote-host nfs02
22: option remote-subvolume /bricks/0/gv01
23: option transport-type socket
24: option transport.address-family inet
25: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
26: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
27: option event-threads 8
28: option transport.tcp-user-timeout 0
29: option transport.socket.keepalive-time 20
30: option transport.socket.keepalive-interval 2
31: option transport.socket.keepalive-count 9
32: option send-gids true
33: end-volume
35: volume gv01-replicate-0
36: type cluster/replicate
37: option afr-pending-xattr gv01-client-0,gv01-client-1
38: option quorum-type auto
39: option use-compound-fops off
40: subvolumes gv01-client-0 gv01-client-1
41: end-volume
43: volume gv01-dht
44: type cluster/distribute
45: option lock-migration off
46: subvolumes gv01-replicate-0
47: end-volume
49: volume gv01-write-behind
50: type performance/write-behind
51: option cache-size 8MB
52: subvolumes gv01-dht
53: end-volume
55: volume gv01-read-ahead
56: type performance/read-ahead
57: subvolumes gv01-write-behind
58: end-volume
60: volume gv01-readdir-ahead
61: type performance/readdir-ahead
62: option parallel-readdir off
63: option rda-request-size 131072
64: option rda-cache-limit 10MB
65: subvolumes gv01-read-ahead
66: end-volume
68: volume gv01-io-cache
69: type performance/io-cache
70: option cache-size 1GB
71: subvolumes gv01-readdir-ahead
72: end-volume
74: volume gv01-quick-read
75: type performance/quick-read
76: option cache-size 1GB
77: subvolumes gv01-io-cache
78: end-volume
80: volume gv01-open-behind
81: type performance/open-behind
82: subvolumes gv01-quick-read
83: end-volume
85: volume gv01-md-cache
86: type performance/md-cache
87: subvolumes gv01-open-behind
88: end-volume
90: volume gv01
91: type debug/io-stats
92: option log-level INFO
93: option latency-measurement off
94: option count-fop-hits off
95: subvolumes gv01-md-cache
96: end-volume
98: volume meta-autoload
99: type meta
100: subvolumes gv01
101: end-volume
+------------------------------------------------------------------------------+
[2018-04-09 05:08:13.922631] E [socket.c:2374:socket_connect_finish]
0-gv01-client-1: connection to 192.168.0.119:24007 failed (No route to
host); disconnecting socket
[2018-04-09 05:08:13.922690] E [MSGID: 108006]
All subvolumes are down. Going offline until atleast one of them comes
back up.
[2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24
kernel 7.22
[2018-04-09 05:08:13.926245] I [fuse-bridge.c:4835:fuse_graph_sync]
0-fuse: switched to graph 0
[2018-04-09 05:08:13.926518] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.926671] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.926762] E [fuse-bridge.c:4271:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2018-04-09 05:08:13.927207] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.927262] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.927301] W
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.927339] E [fuse-bridge.c:900:fuse_getattr_resume]
0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.931497] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.931558] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.931599] W
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.931623] E [fuse-bridge.c:900:fuse_getattr_resume]
0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.937258] I [fuse-bridge.c:5093:fuse_thread_proc]
0-fuse: initating unmount of /n
[2018-04-09 05:08:13.938043] W [glusterfsd.c:1393:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560b52471675]
received signum (15), shutting down
Unmounting '/n'.
[2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] 0-fuse: Closing
fuse connection to '/n'.
==> glusterd.log <==
[2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect]
0-management: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
==> glustershd.log <==
[2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
--
Cheers,
Tom K.
-------------------------------------------------------------------------------------
Living on earth is expensive, but it includes a free trip around the sun.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

TomK

2018-04-11 01:35:09 UTC

Permalink

On 4/9/2018 2:45 AM, Alex K wrote:
Hey Alex,

With two nodes, the setup works but both sides go down when one node is
missing. Still I set the below two params to none and that solved my issue:

cluster.quorum-type: none
cluster.server-quorum-type: none

Thank you for that.

Cheers,
Tom

Post by Alex K
Hi,
You need 3 nodes at least to have quorum enabled. In 2 node setup you
need to disable quorum so as to be able to still use the volume when one
of the nodes go down.
Hey All,
In a two node glusterfs setup, with one node down, can't use the second
node to mount the volume. I understand this is expected behaviour?
Anyway to allow the secondary node to function then replicate what
changed to the first (primary) when it's back online? Or should I just
go for a third node to allow for this?
Also, how safe is it to set the following to none?
cluster.quorum-type: auto
cluster.server-quorum-type: server
volume start: gv01: failed: Quorum not met. Volume operation not allowed.
Status of volume: gv01
Gluster process TCP Port RDMA Port
Online Pid
------------------------------------------------------------------------------
Brick nfs01:/bricks/0/gv01 N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
25561
Task Status of Volume gv01
------------------------------------------------------------------------------
There are no active volume tasks
Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
nfs.trusted-sync: on
performance.cache-size: 1GB
performance.io-thread-count: 16
performance.write-behind-window-size: 8MB
performance.readdir-ahead: on
client.event-threads: 8
server.event-threads: 8
cluster.quorum-type: auto
cluster.server-quorum-type: server
==> n.log <==
[2018-04-09 05:08:13.704156] I [MSGID: 100030] [glusterfsd.c:2556:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.13.2 (args: /usr/sbin/glusterfs --process-name fuse
--volfile-server=nfs01 --volfile-id=/gv01 /n)
[2018-04-09 05:08:13.711255] W [MSGID: 101002]
[options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is
deprecated, preferred is 'transport.address-family', continuing with
correction
[2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-04-09 05:08:13.729025] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2018-04-09 05:08:13.737757] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2018-04-09 05:08:13.738114] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3
[2018-04-09 05:08:13.738203] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2018-04-09 05:08:13.738324] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 5
[2018-04-09 05:08:13.738330] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 6
[2018-04-09 05:08:13.738655] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 7
[2018-04-09 05:08:13.738742] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 8
[2018-04-09 05:08:13.739460] W [MSGID: 101174]
[graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: option
'parallel-readdir' is not recognized
[2018-04-09 05:08:13.739787] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-0: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:13.747372] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-1: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747883] E [MSGID: 114058]
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2018-04-09 05:08:13.748026] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from
gv01-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-09 05:08:13.748070] W [MSGID: 108001]
[afr-common.c:5391:afr_notify] 0-gv01-replicate-0: Client-quorum is not met
[2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
+------------------------------------------------------------------------------+
1: volume gv01-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host nfs01
5: option remote-subvolume /bricks/0/gv01
6: option transport-type socket
7: option transport.address-family inet
8: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
9: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
10: option event-threads 8
11: option transport.tcp-user-timeout 0
12: option transport.socket.keepalive-time 20
13: option transport.socket.keepalive-interval 2
14: option transport.socket.keepalive-count 9
15: option send-gids true
16: end-volume
18: volume gv01-client-1
19: type protocol/client
20: option ping-timeout 42
21: option remote-host nfs02
22: option remote-subvolume /bricks/0/gv01
23: option transport-type socket
24: option transport.address-family inet
25: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
26: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
27: option event-threads 8
28: option transport.tcp-user-timeout 0
29: option transport.socket.keepalive-time 20
30: option transport.socket.keepalive-interval 2
31: option transport.socket.keepalive-count 9
32: option send-gids true
33: end-volume
35: volume gv01-replicate-0
36: type cluster/replicate
37: option afr-pending-xattr gv01-client-0,gv01-client-1
38: option quorum-type auto
39: option use-compound-fops off
40: subvolumes gv01-client-0 gv01-client-1
41: end-volume
43: volume gv01-dht
44: type cluster/distribute
45: option lock-migration off
46: subvolumes gv01-replicate-0
47: end-volume
49: volume gv01-write-behind
50: type performance/write-behind
51: option cache-size 8MB
52: subvolumes gv01-dht
53: end-volume
55: volume gv01-read-ahead
56: type performance/read-ahead
57: subvolumes gv01-write-behind
58: end-volume
60: volume gv01-readdir-ahead
61: type performance/readdir-ahead
62: option parallel-readdir off
63: option rda-request-size 131072
64: option rda-cache-limit 10MB
65: subvolumes gv01-read-ahead
66: end-volume
68: volume gv01-io-cache
69: type performance/io-cache
70: option cache-size 1GB
71: subvolumes gv01-readdir-ahead
72: end-volume
74: volume gv01-quick-read
75: type performance/quick-read
76: option cache-size 1GB
77: subvolumes gv01-io-cache
78: end-volume
80: volume gv01-open-behind
81: type performance/open-behind
82: subvolumes gv01-quick-read
83: end-volume
85: volume gv01-md-cache
86: type performance/md-cache
87: subvolumes gv01-open-behind
88: end-volume
90: volume gv01
91: type debug/io-stats
92: option log-level INFO
93: option latency-measurement off
94: option count-fop-hits off
95: subvolumes gv01-md-cache
96: end-volume
98: volume meta-autoload
99: type meta
100: subvolumes gv01
101: end-volume
+------------------------------------------------------------------------------+
[2018-04-09 05:08:13.922631] E [socket.c:2374:socket_connect_finish]
0-gv01-client-1: connection to 192.168.0.119:24007
<http://192.168.0.119:24007> failed (No route to
host); disconnecting socket
[2018-04-09 05:08:13.922690] E [MSGID: 108006]
All subvolumes are down. Going offline until atleast one of them comes
back up.
[2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24
kernel 7.22
[2018-04-09 05:08:13.926245] I [fuse-bridge.c:4835:fuse_graph_sync]
0-fuse: switched to graph 0
[2018-04-09 05:08:13.926518] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.926671] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.926762] E [fuse-bridge.c:4271:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2018-04-09 05:08:13.927207] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.927262] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.927301] W
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.927339] E [fuse-bridge.c:900:fuse_getattr_resume]
0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.931497] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.931558] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.931599] W
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.931623] E [fuse-bridge.c:900:fuse_getattr_resume]
0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.937258] I [fuse-bridge.c:5093:fuse_thread_proc]
0-fuse: initating unmount of /n
[2018-04-09 05:08:13.938043] W [glusterfsd.c:1393:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560b52471675]
received signum (15), shutting down
Unmounting '/n'.
[2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] 0-fuse: Closing
fuse connection to '/n'.
==> glusterd.log <==
[2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect]
0-management: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
==> glustershd.log <==
[2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
--
Cheers,
Tom K.
-------------------------------------------------------------------------------------
Living on earth is expensive, but it includes a free trip around the sun.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.

Alex K

2018-04-11 15:54:16 UTC

Permalink

Post by TomK
Hey Alex,
With two nodes, the setup works but both sides go down when one node is
cluster.quorum-type: none
cluster.server-quorum-type: none
yes this disables quorum so as to avoid the issue. Glad that this helped.

Bare in in mind though that it is easier to face split-brain issues with
quorum is disabled, that's why 3 nodes at least are recommended. Just to
note that I have also a 2 node cluster which is running without issues for
long time.

Post by TomK
Thank you for that.
Cheers,
Tom
Hi,

Post by Alex K
You need 3 nodes at least to have quorum enabled. In 2 node setup you
need to disable quorum so as to be able to still use the volume when one of
the nodes go down.
Hey All,
In a two node glusterfs setup, with one node down, can't use the second
node to mount the volume. I understand this is expected behaviour?
Anyway to allow the secondary node to function then replicate what
changed to the first (primary) when it's back online? Or should I just
go for a third node to allow for this?
Also, how safe is it to set the following to none?
cluster.quorum-type: auto
cluster.server-quorum-type: server
volume start: gv01: failed: Quorum not met. Volume operation not allowed.
Status of volume: gv01
Gluster process TCP Port RDMA Port
Online Pid
------------------------------------------------------------
------------------
Brick nfs01:/bricks/0/gv01 N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
25561
Task Status of Volume gv01
------------------------------------------------------------
------------------
There are no active volume tasks
Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
nfs.trusted-sync: on
performance.cache-size: 1GB
performance.io-thread-count: 16
performance.write-behind-window-size: 8MB
performance.readdir-ahead: on
client.event-threads: 8
server.event-threads: 8
cluster.quorum-type: auto
cluster.server-quorum-type: server
==> n.log <==
[2018-04-09 05:08:13.704156] I [MSGID: 100030]
[glusterfsd.c:2556:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.13.2 (args: /usr/sbin/glusterfs --process-name fuse
--volfile-server=nfs01 --volfile-id=/gv01 /n)
[2018-04-09 05:08:13.711255] W [MSGID: 101002]
[options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is
deprecated, preferred is 'transport.address-family', continuing with
correction
[2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-04-09 05:08:13.729025] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2018-04-09 05:08:13.737757] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2018-04-09 05:08:13.738114] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3
[2018-04-09 05:08:13.738203] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2018-04-09 05:08:13.738324] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 5
[2018-04-09 05:08:13.738330] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 6
[2018-04-09 05:08:13.738655] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 7
[2018-04-09 05:08:13.738742] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 8
[2018-04-09 05:08:13.739460] W [MSGID: 101174]
[graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: option
'parallel-readdir' is not recognized
[2018-04-09 05:08:13.739787] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-0: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:13.747372] I [MSGID: 114020] [client.c:2360:notify]
0-gv01-client-1: parent translators are ready, attempting connect on
transport
[2018-04-09 05:08:13.747883] E [MSGID: 114058]
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2018-04-09 05:08:13.748026] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from
gv01-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-09 05:08:13.748070] W [MSGID: 108001]
[afr-common.c:5391:afr_notify] 0-gv01-replicate-0: Client-quorum is not met
[2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
+-----------------------------------------------------------
-------------------+
1: volume gv01-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host nfs01
5: option remote-subvolume /bricks/0/gv01
6: option transport-type socket
7: option transport.address-family inet
8: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
9: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
10: option event-threads 8
11: option transport.tcp-user-timeout 0
12: option transport.socket.keepalive-time 20
13: option transport.socket.keepalive-interval 2
14: option transport.socket.keepalive-count 9
15: option send-gids true
16: end-volume
18: volume gv01-client-1
19: type protocol/client
20: option ping-timeout 42
21: option remote-host nfs02
22: option remote-subvolume /bricks/0/gv01
23: option transport-type socket
24: option transport.address-family inet
25: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
26: option password a44739e0-9587-411f-8e6a-9a6a4e46156c
27: option event-threads 8
28: option transport.tcp-user-timeout 0
29: option transport.socket.keepalive-time 20
30: option transport.socket.keepalive-interval 2
31: option transport.socket.keepalive-count 9
32: option send-gids true
33: end-volume
35: volume gv01-replicate-0
36: type cluster/replicate
37: option afr-pending-xattr gv01-client-0,gv01-client-1
38: option quorum-type auto
39: option use-compound-fops off
40: subvolumes gv01-client-0 gv01-client-1
41: end-volume
43: volume gv01-dht
44: type cluster/distribute
45: option lock-migration off
46: subvolumes gv01-replicate-0
47: end-volume
49: volume gv01-write-behind
50: type performance/write-behind
51: option cache-size 8MB
52: subvolumes gv01-dht
53: end-volume
55: volume gv01-read-ahead
56: type performance/read-ahead
57: subvolumes gv01-write-behind
58: end-volume
60: volume gv01-readdir-ahead
61: type performance/readdir-ahead
62: option parallel-readdir off
63: option rda-request-size 131072
64: option rda-cache-limit 10MB
65: subvolumes gv01-read-ahead
66: end-volume
68: volume gv01-io-cache
69: type performance/io-cache
70: option cache-size 1GB
71: subvolumes gv01-readdir-ahead
72: end-volume
74: volume gv01-quick-read
75: type performance/quick-read
76: option cache-size 1GB
77: subvolumes gv01-io-cache
78: end-volume
80: volume gv01-open-behind
81: type performance/open-behind
82: subvolumes gv01-quick-read
83: end-volume
85: volume gv01-md-cache
86: type performance/md-cache
87: subvolumes gv01-open-behind
88: end-volume
90: volume gv01
91: type debug/io-stats
92: option log-level INFO
93: option latency-measurement off
94: option count-fop-hits off
95: subvolumes gv01-md-cache
96: end-volume
98: volume meta-autoload
99: type meta
100: subvolumes gv01
101: end-volume
+-----------------------------------------------------------
-------------------+
[2018-04-09 05:08:13.922631] E [socket.c:2374:socket_connect_finish]
0-gv01-client-1: connection to 192.168.0.119:24007
<http://192.168.0.119:24007> failed (No route to
host); disconnecting socket
[2018-04-09 05:08:13.922690] E [MSGID: 108006]
All subvolumes are down. Going offline until atleast one of them comes
back up.
[2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24
kernel 7.22
[2018-04-09 05:08:13.926245] I [fuse-bridge.c:4835:fuse_graph_sync]
0-fuse: switched to graph 0
[2018-04-09 05:08:13.926518] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.926671] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.926762] E [fuse-bridge.c:4271:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2018-04-09 05:08:13.927207] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.927262] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.927301] W
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.927339] E [fuse-bridge.c:900:fuse_getatt
r_resume]
0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.931497] I [MSGID: 108006]
[afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.931558] E [MSGID: 101046]
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.931599] W
00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2018-04-09 05:08:13.931623] E [fuse-bridge.c:900:fuse_getatt
r_resume]
0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001)
resolution failed
[2018-04-09 05:08:13.937258] I [fuse-bridge.c:5093:fuse_thread_proc]
0-fuse: initating unmount of /n
[2018-04-09 05:08:13.938043] W [glusterfsd.c:1393:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560b52471675]
received signum (15), shutting down
Unmounting '/n'.
[2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] 0-fuse: Closing
fuse connection to '/n'.
==> glusterd.log <==
[2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect]
0-management: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
==> glustershd.log <==
[2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect]
0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
[2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect]
0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not
available"
--
Cheers,
Tom K.
------------------------------------------------------------
-------------------------
Living on earth is expensive, but it includes a free trip around the sun.
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Cheers,
Tom K.
------------------------------------------------------------
-------------------------
Living on earth is expensive, but it includes a free trip around the sun.

TomK

2018-05-22 05:43:42 UTC

Permalink

Hey All,

Appears I solved this one and NFS mounts now work on all my clients. No
issues since fixing it a few hours back.

RESOLUTION

Auditd is to blame for the trouble. Noticed this in the logs on 2 of
the 3 NFS servers (nfs01, nfs02, nfs03):

type=AVC msg=audit(1526965320.850:4094): avc: denied { write } for
pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689
scontext=system_u:system_r:ganesha_t:s0
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4094): arch=c000003e syscall=2
success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1
pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd"
exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4094):
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54
type=AVC msg=audit(1526965320.850:4095): avc: denied { unlink } for
pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689
scontext=system_u:system_r:ganesha_t:s0
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4095): arch=c000003e syscall=87
success=no exit=-13 a0=7f23b0004100 a1=7f23b0000050 a2=7f23b0004100 a3=5
items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0
fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295
comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd"
subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4095):
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54

Fix was to adjust the SELinux rules using audit2allow.

All the errors below including the one in the link below, were due to that.

Turns out that when ever it worked, it hit the only working server in
the system, nfs03. Whenever it didn't work, it was hitting the non
working servers. So sometimes it worked, and other times it didn't. It
looked like it was to do with Haproxy / Keepalived as well since I
couldn't mount using the VIP but could using the host. But that wasn't
the case either.

I've also added the third brick to the Gluster FS, nfs03, trying to see
if the backend FS was to blame since Gluster FS recommends 3 bricks
minimum for replication, but that had no effect.

In case anyone runs into this, I've added notes here as well:

http://microdevsys.com/wp/kernel-nfs-nfs4_discover_server_trunking-unhandled-error-512-exiting-with-error-eio-and-mount-hangs/

http://microdevsys.com/wp/nfs-reply-xid-3844308326-reply-err-20-auth-rejected-credentials-client-should-begin-new-session/

The errors thrown included:

NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client
should begin new session)

kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting
with error EIO and mount hangs

+ the kernel exception below.
--
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.

May 21 23:53:13 psql01 kernel: CPU: 3 PID: 2273 Comm: mount.nfs Tainted:
G L ------------ 3.10.0-693.21.1.el7.x86_64 #1
.
.
.
May 21 23:53:13 psql01 kernel: task: ffff880136335ee0 ti:
ffff8801376b0000 task.ti: ffff8801376b0000
May 21 23:53:13 psql01 kernel: RIP: 0010:[<ffffffff816b6545>]
[<ffffffff816b6545>] _raw_spin_unlock_irqrestore+0x15/0x20
May 21 23:53:13 psql01 kernel: RSP: 0018:ffff8801376b3a60 EFLAGS: 00000206
May 21 23:53:13 psql01 kernel: RAX: ffffffffc05ab078 RBX:
ffff880036973928 RCX: dead000000000200
May 21 23:53:13 psql01 kernel: RDX: ffffffffc05ab078 RSI:
0000000000000206 RDI: 0000000000000206
May 21 23:53:13 psql01 kernel: RBP: ffff8801376b3a60 R08:
ffff8801376b3ab8 R09: ffff880137de1200
May 21 23:53:13 psql01 kernel: R10: ffff880036973928 R11:
0000000000000000 R12: ffff880036973928
May 21 23:53:13 psql01 kernel: R13: ffff8801376b3a58 R14:
ffff88013fd98a40 R15: ffff8801376b3a58
May 21 23:53:13 psql01 kernel: FS: 00007fab48f07880(0000)
GS:ffff88013fd80000(0000) knlGS:0000000000000000
May 21 23:53:13 psql01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
May 21 23:53:13 psql01 kernel: CR2: 00007f99793d93cc CR3:
000000013761e000 CR4: 00000000000007e0
May 21 23:53:13 psql01 kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 21 23:53:13 psql01 kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
May 21 23:53:13 psql01 kernel: Call Trace:
May 21 23:53:13 psql01 kernel: [<ffffffff810b4d86>] finish_wait+0x56/0x70
May 21 23:53:13 psql01 kernel: [<ffffffffc0580361>]
nfs_wait_client_init_complete+0xa1/0xe0 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffff810b4fc0>] ?
wake_up_atomic_t+0x30/0x30
May 21 23:53:13 psql01 kernel: [<ffffffffc0581e9b>]
nfs_get_client+0x22b/0x470 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc05eafd8>]
nfs4_set_client+0x98/0x130 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05ec77e>]
nfs4_create_server+0x13e/0x3b0 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05e391e>]
nfs4_remote_mount+0x2e/0x60 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [<ffffffff811aa685>] ?
__alloc_percpu+0x15/0x20
May 21 23:53:13 psql01 kernel: [<ffffffff81226d57>]
vfs_kern_mount+0x67/0x110
May 21 23:53:13 psql01 kernel: [<ffffffffc05e3846>]
nfs_do_root_mount+0x86/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05e3c44>]
nfs4_try_mount+0x44/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05826d7>] ?
get_nfs_version+0x27/0x90 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc058ec9b>]
nfs_fs_mount+0x4cb/0xda0 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc058fbe0>] ?
nfs_clone_super+0x140/0x140 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc058daa0>] ?
param_set_portnr+0x70/0x70 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [<ffffffff811aa685>] ?
__alloc_percpu+0x15/0x20
May 21 23:53:13 psql01 kernel: [<ffffffff81226d57>]
vfs_kern_mount+0x67/0x110
May 21 23:53:13 psql01 kernel: [<ffffffff81229263>] do_mount+0x233/0xaf0
May 21 23:53:13 psql01 kernel: [<ffffffff81229ea6>] SyS_mount+0x96/0xf0
May 21 23:53:13 psql01 kernel: [<ffffffff816c0715>]
system_call_fastpath+0x1c/0x21
May 21 23:53:13 psql01 kernel: [<ffffffff816c0661>] ?
system_call_after_swapgs+0xae/0x146

This list has been deprecated. Please subscribe to the new support list
at lists.nfs-ganesha.org.
Hey Guy's,
cluster.quorum-type: none
cluster.server-quorum-type: none
I've ran into a number of gluster errors (see below).
I'm using gluster as the backend for my NFS storage. I have gluster
running on two nodes, nfs01 and nfs02. It's mounted on /n on each host.
The path /n is in turn shared out by NFS Ganesha. It's a two node
nfs02:/gv01 on /n type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
nfs01:/gv01 on /n type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
Gluster always reports as working no matter when I type the below two
Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
cluster.server-quorum-type: none
cluster.quorum-type: none
server.event-threads: 8
client.event-threads: 8
performance.readdir-ahead: on
performance.write-behind-window-size: 8MB
performance.io-thread-count: 16
performance.cache-size: 1GB
nfs.trusted-sync: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
unrecognized word: status (position 0)
Status of volume: gv01
Gluster process                             TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick nfs01:/bricks/0/gv01                  49152     0          Y 1422
Brick nfs02:/bricks/0/gv01                  49152     0          Y 1422
Self-heal Daemon on localhost               N/A       N/A        Y 1248
Self-heal Daemon on nfs02.nix.my.dom       N/A       N/A        Y
1251
Task Status of Volume gv01
------------------------------------------------------------------------------
There are no active volume tasks
glusterfs-3.13.2-2.el7.x86_64
glusterfs-devel-3.13.2-2.el7.x86_64
glusterfs-fuse-3.13.2-2.el7.x86_64
glusterfs-api-devel-3.13.2-2.el7.x86_64
centos-release-gluster313-1.0-1.el7.centos.noarch
python2-gluster-3.13.2-2.el7.x86_64
glusterfs-client-xlators-3.13.2-2.el7.x86_64
glusterfs-server-3.13.2-2.el7.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64
glusterfs-cli-3.13.2-2.el7.x86_64
centos-release-gluster312-1.0-1.el7.centos.noarch
python2-glusterfs-api-1.1-1.el7.noarch
glusterfs-libs-3.13.2-2.el7.x86_64
glusterfs-extra-xlators-3.13.2-2.el7.x86_64
glusterfs-api-3.13.2-2.el7.x86_64
The short of it is that everything works and mounts on guests work as
long as I don't try to write to the NFS share from my clients. If I try
-sh-4.2$ pwd
/n/my.dom/tom
-sh-4.2$ ls -altri
total 6258
20:15 .bashrc
20:15 .bash_profile
20:15 .bash_logout
10718721668898812166 drwxr-xr-x. 3 root        root           4096 Mar 5
02:46 ..
03:07 .ssh
22:38 opennebula-cores.tar.gz
23:25 meh.txt
01:30 nfs-trace-working.dat.gz
23:38 .bash_history
23:58 .
23:58 .meh.txt.swp
-sh-4.2$ touch test.txt
-sh-4.2$ vi test.txt
-sh-4.2$ ls -altri
ls: cannot open directory .: Permission denied
-sh-4.2$ ls -altri
ls: cannot open directory .: Permission denied
-sh-4.2$ ls -altri
This is followed by a slew of other errors in apps using the gluster
ganesha.nfsd-5891[svc_12] nfs_rpc_process_request :DISP :INFO :Could not
authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
==> ganesha-gfapi.log <==
[2018-05-03 04:32:18.009245] I [MSGID: 114021] [client.c:2369:notify]
0-gv01-client-0: current graph is no longer active, destroying rpc_client
[2018-05-03 04:32:18.009338] I [MSGID: 114021] [client.c:2369:notify]
0-gv01-client-1: current graph is no longer active, destroying rpc_client
[2018-05-03 04:32:18.009499] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from
gv01-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-05-03 04:32:18.009557] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-gv01-client-1: disconnected from
gv01-client-1. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-05-03 04:32:18.009610] E [MSGID: 108006]
All subvolumes are down. Going offline until atleast one of them comes
back up.
[2018-05-01 22:43:06.412067] E [MSGID: 114058]
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish]
0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection
refused); disconnecting socket
So I'm wondering, if this is due to the two node gluster, as it seems to
be, and what is it that I really need to do here? Should I go with the
recommended 3 node setup to avoid this which would include a proper
quorum? Or is there more to this and it really doesn't matter if I have
a 2 node gluster cluster without a quorum and this is due to something
else still?
Again, anytime I check the gluter volumes, everything checks out. The
results of both 'gluster volume info' and 'gluster volume status' is
always as I pasted above, fully working.
I'm also using the Linux KDC Free IPA with this solution as well.

Daniel Gryniewicz

2018-05-22 12:58:46 UTC

Permalink

Thanks, Tom. Good to know.

Daniel

This list has been deprecated. Please subscribe to the new support list
at lists.nfs-ganesha.org.
Hey All,
Appears I solved this one and NFS mounts now work on all my clients. No
issues since fixing it a few hours back.
RESOLUTION
Auditd is to blame for the trouble. Noticed this in the logs on 2 of
type=AVC msg=audit(1526965320.850:4094): avc: denied { write } for
pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689
scontext=system_u:system_r:ganesha_t:s0
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4094): arch=c000003e syscall=2
success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1
pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd"
exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54
type=AVC msg=audit(1526965320.850:4095): avc: denied { unlink } for
pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689
scontext=system_u:system_r:ganesha_t:s0
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4095): arch=c000003e syscall=87
success=no exit=-13 a0=7f23b0004100 a1=7f23b0000050 a2=7f23b0004100 a3=5
items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0
fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295
comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd"
subj=system_u:system_r:ganesha_t:s0 key=(null)
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54
Fix was to adjust the SELinux rules using audit2allow.
All the errors below including the one in the link below, were due to that.
Turns out that when ever it worked, it hit the only working server in
the system, nfs03. Whenever it didn't work, it was hitting the non
working servers. So sometimes it worked, and other times it didn't. It
looked like it was to do with Haproxy / Keepalived as well since I
couldn't mount using the VIP but could using the host. But that wasn't
the case either.
I've also added the third brick to the Gluster FS, nfs03, trying to see
if the backend FS was to blame since Gluster FS recommends 3 bricks
minimum for replication, but that had no effect.
http://microdevsys.com/wp/kernel-nfs-nfs4_discover_server_trunking-unhandled-error-512-exiting-with-error-eio-and-mount-hangs/
http://microdevsys.com/wp/nfs-reply-xid-3844308326-reply-err-20-auth-rejected-credentials-client-should-begin-new-session/
NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client
should begin new session)
kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting
with error EIO and mount hangs
+ the kernel exception below.