Discussion:
Replica 3 with arbiter - heal error?
(too old to reply)
Pavel Szalbot
2017-07-11 13:03:04 UTC
Permalink
Raw Message
Hello,

I have a Gluster 3.8.13 with replica 3 arbiter volume mounted and run
there a following script:

while true; do echo "$(date)" >> a.txt; sleep 2; done

After few seconds I add a rule to the firewall on the client, that
blocks access to node specified during mount e.g. if volume is mounted
with:

mount -t glusterfs -o backupvolfile-server=10.0.0.2 10.0.0.1:/vol /mnt/vol

I add:

iptables -A OUTPUT -d 10.0.0.1 -j REJECT

This causes the script above to block for approximately 40 seconds
until gluster client tries backupvolfile-server (can this timeout be
changed?) and everything continues as expected.

Heal info shows that this file (a.txt) undergoes healing. About a
minute later, last line of the a.txt contains $(date) same as just
before the firewall modification. Each consecutive write e.g. echo
"STRING" >> a.txt actually appends not the "STRING", but number of
bytes previously written.

If the file content just before firewall rule addition is:
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017

It will later become (which is OK):
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Jul 11 14:20:18 CEST 2017
Tue Jul 11 14:20:20 CEST 2017
Tue Jul 11 14:20:22 CEST 2017

But after some time, file content is only:
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017

And echo "STRING" >> a.txt makes it (6 bytes appended, not STRING):
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Ju

Another echo "STRING" >> a.txt causes the content to be:
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Jul 11 1

Removing the firewall rule does not change the content and different
client with access to all nodes sees exactly the same content as this
one.

Is this normal behavior or bug or is there any configuration that I
should have changed in order to have replica 3 with arbiter highly
available?

I stumbled upon this while testing how to upgrade Gluster so the
clients resp. VMs on the clients are not affected by the "transport
endpoint error" caused by primary mountpoint undergoing upgrade and
therefore glusterd being not available for several seconds.

Volume config:
server.allow-insecure: on
server.outstanding-rpc-limit: 1024
performance.read-ahead: off
performance.io-thread-count: 64
performance.client-io-threads: on
performance.cache-size: 1GB
cluster.self-heal-daemon: enable
nfs.disable: on
performance.readdir-ahead: on
features.shard: on
performance.quick-read: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
user.cifs: off

-ps
Pavel Szalbot
2017-07-11 13:59:10 UTC
Permalink
Raw Message
I tested the same procedure on volume with following config and cannot
reproduce the issue. Should I file a bug?

transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off

Btw nevermind the 40 seconds timeout, got the network.ping-timeout ;-)
-ps
Post by Pavel Szalbot
Hello,
I have a Gluster 3.8.13 with replica 3 arbiter volume mounted and run
while true; do echo "$(date)" >> a.txt; sleep 2; done
After few seconds I add a rule to the firewall on the client, that
blocks access to node specified during mount e.g. if volume is mounted
mount -t glusterfs -o backupvolfile-server=10.0.0.2 10.0.0.1:/vol /mnt/vol
iptables -A OUTPUT -d 10.0.0.1 -j REJECT
This causes the script above to block for approximately 40 seconds
until gluster client tries backupvolfile-server (can this timeout be
changed?) and everything continues as expected.
Heal info shows that this file (a.txt) undergoes healing. About a
minute later, last line of the a.txt contains $(date) same as just
before the firewall modification. Each consecutive write e.g. echo
"STRING" >> a.txt actually appends not the "STRING", but number of
bytes previously written.
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Jul 11 14:20:18 CEST 2017
Tue Jul 11 14:20:20 CEST 2017
Tue Jul 11 14:20:22 CEST 2017
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Ju
Tue Jul 11 14:19:37 CEST 2017
Tue Jul 11 14:19:39 CEST 2017
Tue Jul 11 1
Removing the firewall rule does not change the content and different
client with access to all nodes sees exactly the same content as this
one.
Is this normal behavior or bug or is there any configuration that I
should have changed in order to have replica 3 with arbiter highly
available?
I stumbled upon this while testing how to upgrade Gluster so the
clients resp. VMs on the clients are not affected by the "transport
endpoint error" caused by primary mountpoint undergoing upgrade and
therefore glusterd being not available for several seconds.
server.allow-insecure: on
server.outstanding-rpc-limit: 1024
performance.read-ahead: off
performance.io-thread-count: 64
performance.client-io-threads: on
performance.cache-size: 1GB
cluster.self-heal-daemon: enable
nfs.disable: on
performance.readdir-ahead: on
features.shard: on
performance.quick-read: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
user.cifs: off
-ps
Loading...