Discussion:
Rebalance failed on Distributed Disperse volume based on 3.12.14 version
(too old to reply)
Mauro Tridici
2018-09-26 10:08:35 UTC
Permalink
Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io-thread-count: 16
disperse.cpu-extensions: auto
performance.io-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro
Ashish Pandey
2018-09-26 12:13:16 UTC
Permalink
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "gluster-users" <gluster-***@gluster.org>
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info



Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io -thread-count: 16
disperse.cpu-extensions: auto
performance.io -cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro









_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-26 12:25:02 UTC
Permalink
Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro
Post by Ashish Pandey
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Ashish Pandey
2018-09-26 12:51:10 UTC
Permalink
Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "gluster-users" <gluster-***@gluster.org>
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro




Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io -thread-count: 16
disperse.cpu-extensions: auto
performance.io -cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro









_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users







_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-26 13:24:19 UTC
Permalink
Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro
Post by Ashish Pandey
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Ashish Pandey
2018-09-26 17:33:18 UTC
Permalink
Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish

----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "gluster-users" <gluster-***@gluster.org>
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro











Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io -thread-count: 16
disperse.cpu-extensions: auto
performance.io -cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro









_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users







_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-26 17:56:54 UTC
Permalink
Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro
Post by Ashish Pandey
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Mauro Tridici
2018-09-27 10:33:04 UTC
Permalink
Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro
Post by Mauro Tridici
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Post by Ashish Pandey
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Ashish Pandey
2018-09-27 10:38:25 UTC
Permalink
Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "gluster-users" <gluster-***@gluster.org>
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro




Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io -thread-count: 16
disperse.cpu-extensions: auto
performance.io -cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro









_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users







_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-27 10:54:12 UTC
Permalink
Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro
Post by Ashish Pandey
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Ashish Pandey
2018-09-27 11:14:30 UTC
Permalink
Yes, you can.
If not me others may also reply.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "gluster-users" <gluster-***@gluster.org>
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro





Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io -thread-count: 16
disperse.cpu-extensions: auto
performance.io -cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro









_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users







_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-28 10:51:03 UTC
Permalink
Hi Ashish,

as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.

So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)

Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)

[***@s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06

If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?

Thank you in advance,
Mauro
Post by Ashish Pandey
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Nithya Balachandran
2018-09-28 11:01:34 UTC
Permalink
Hi Mauro,


Please send the rebalance logs from s04-stg. I will take a look and get
back.


Regards,
Nithya
Post by Mauro Tridici
Hi Ashish,
as I said in my previous message, we adopted the first approach you
suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated
in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks
1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after
about 3TB of moved data, rebalance speed slowed down and some transfer
errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete
the step, we decided to stop the remove-brick execution and start it again
(I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a
look to the following output)
s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
s04-stg 0 0Bytes
182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other
suggestion that, in this particular case, could be useful to reduce errors
(I know that they are related to the current volume configuration) and
improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
------------------------------
*Sent: *Thursday, September 27, 2018 4:24:12 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout
option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this
value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should
have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will
have 4 other bricks of that volume and the data on that volume would be
accessible.
In current setup if s04-stg goes down, you will loose all the data on V1
and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
*Approach 1*
We have to remove all the newly added bricks in a set of 6 bricks. This
will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those
bricks again in a set of 6 bricks, this time have 2 bricks from each of the
3 newly added Nodes.
While this is a valid and working approach, I personally think that this
will take long time and also require lot of movement of data.
*Approach 2*
In this approach we can use the heal process. We have to deal with all the
volumes (V1 to V6) one by one. Following are the steps for V1-
*Step 1 - *
Use replace-brick command to move following bricks on *s05-stg* node *one
by one (heal should be completed after every replace brick command)*
*Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>*
*Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is
free>*
gluster v replace-brick <volname> *s04-stg:/gluster/mnt3/brick* *s05-stg:/<brick
which is free>* commit force
Try to give names to the bricks so that you can identify which 6 bricks
belongs to same ec subvolume
Use replace-brick command to move following bricks on *s06-stg* node one
by one
Brick41: s04-stg:/gluster/mnt5/brick to *s06-stg/<brick which is free>*
Brick42: s04-stg:/gluster/mnt6/brick to *s06-stg/<other brick which is
free>*
*Step 2* - After, every replace-brick command, you have to wait for heal
to be completed.
check *"gluster v heal <volname> info "* if it shows any entry you have
to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed.
The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time
and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
------------------------------
*Sent: *Thursday, September 27, 2018 4:03:04 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had
some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici <
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over
here and I don't want to miss anything in hurry?
---
Ashish
------------------------------
*Sent: *Wednesday, September 26, 2018 6:54:19 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated
brick log file (the other files in /var/log/glusterfs/bricks directory seem
to be too old).
I just stopped the running rebalance (as you can see at the bottom of the
rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please
describe me step by step the right procedure to remove the newly added
bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one
of the first 3 nodes (s01, s02, s03) already existing and on one of the
last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10%
/gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10%
/gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10%
/gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61%
/gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64%
/gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61%
/gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation
you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg
the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need
healing and all. In addition to that we have to wait for re-balance to
complete.
I would suggest that if whole data has not been rebalanced and if you can
stop the rebalance and remove these newly added bricks properly then you
should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3
newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then
facing issues in future when it will be almost impossible to correct these
things if you have lots of data.
---
Ashish
------------------------------
*Sent: *Wednesday, September 26, 2018 5:55:02 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure
to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to
provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these
bricks will be of same subvolume, which is not good. Same is true for the
bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of
disruption in connection of these bricks will be higher in this case.
---
Ashish
------------------------------
*Sent: *Wednesday, September 26, 2018 3:38:35 PM
*Subject: *[Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a
rebalance process to distribute data across the bricks, but something goes
wrong.
Rebalance failed on different nodes and the time value needed to complete
the procedure seems to be very high.
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 19 161.6GB
537 2 2 in progress 0:32:23
s02-stg 25 212.7GB
526 5 2 in progress 0:32:25
s03-stg 4 69.1GB
511 0 0 in progress 0:32:25
s04-stg 4 484Bytes
12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes
11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB
8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in
/var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status]
0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111,
mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status]
0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111,
mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv]
0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset
by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select]
0-tier2-disperse-9: Executing operation with some subvolumes unavailable
(10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select]
0-tier2-disperse-9: Executing operation with some subvolumes unavailable
(10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space]
0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/
dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result
in dst node (tier2-disperse-5:2440190848) having lower disk space than the
source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
[0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired]
0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last
42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network
usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some
volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu
10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io-thread-count: 16
disperse.cpu-extensions: auto
performance.io-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
If it can help, I paste here the output of “free -m” command executed on
The result is almost the same on every nodes. In your opinion, the
available RAM is enough to support data movement?
total used free shared buff/cache
available
Mem: 64309 10409 464 15 53434
52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-28 11:11:01 UTC
Permalink
Hi Nithya,

the gzip file containing logs is about 14MB.
How can I send it to you? Is there a FTP server or something similar?

Thank you for your support,
Mauro
Post by Ashish Pandey
Hi Mauro,
Please send the rebalance logs from s04-stg. I will take a look and get back.
Regards,
Nithya
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Post by Ashish Pandey
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 <http://192.168.0.55:49153/> failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 <http://192.168.0.52:49153/> has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Mauro Tridici
2018-09-28 11:31:21 UTC
Permalink
Hi Nithya,

I just shared the log file using google drive.
Please, click on the following link:

https://drive.google.com/file/d/1k5kebn19obGRp_ThOeH4l9AXzXyqJG4V/view?usp=sharing

Many thanks,
Mauro
Post by Mauro Tridici
Hi Nithya,
the gzip file containing logs is about 14MB.
How can I send it to you? Is there a FTP server or something similar?
Thank you for your support,
Mauro
Post by Ashish Pandey
Hi Mauro,
Please send the rebalance logs from s04-stg. I will take a look and get back.
Regards,
Nithya
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Post by Ashish Pandey
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 <http://192.168.0.55:49153/> failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 <http://192.168.0.52:49153/> has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b <https://it.linkedin.com/in/mauro-tridici-5977238b>
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Ashish Pandey
2018-09-28 12:36:30 UTC
Permalink
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.

Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.

Let me know once you think everything is fine with the system and there is nothing to heal.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "gluster-users" <gluster-***@gluster.org>
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.

So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)

Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)

[***@s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06

If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?

Thank you in advance,
Mauro




Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey < ***@redhat.com > ha scritto:


Yes, you can.
If not me others may also reply.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro



<blockquote>

Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io -thread-count: 16
disperse.cpu-extensions: auto
performance.io -cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro









_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users







_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b
Mauro Tridici
2018-09-28 13:38:41 UTC
Permalink
Dear Ashish,

please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.

At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.

For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.

Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!

If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.

Thank you in advance for your help and patience.
Regards,
Mauro
Post by Ashish Pandey
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io <http://performance.io/>-thread-count: 16
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432
Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.
Regards,
Mauro
Ashish Pandey
2018-09-28 14:39:36 UTC
Permalink
----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "gluster-users" <gluster-***@gluster.org>
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.

At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.

For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.

Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.

Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.

It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------

If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.

Thank you in advance for your help and patience.
Regards,
Mauro






Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey < ***@redhat.com > ha scritto:


We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.

Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.

Let me know once you think everything is fine with the system and there is nothing to heal.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.

So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)

Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)

[***@s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06

If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?

Thank you in advance,
Mauro


<blockquote>

Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey < ***@redhat.com > ha scritto:


Yes, you can.
If not me others may also reply.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro



<blockquote>

Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 60
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.server-quorum-type: server
features.default-soft-limit: 90
features.quota-deem-statfs: on
performance.io -thread-count: 16
disperse.cpu-extensions: auto
performance.io -cache: off
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.parallel-readdir: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.min-free-disk: 10
performance.client-io-threads: on
features.quota: on
features.inode-quota: on
features.bitrot: on
features.scrub: Active
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

If it can help, I paste here the output of “free -m” command executed on all the cluster nodes:

The result is almost the same on every nodes. In your opinion, the available RAM is enough to support data movement?

[***@s06 ~]# free -m
total used free shared buff/cache available
Mem: 64309 10409 464 15 53434 52998
Swap: 65535 103 65432

Thank you in advance.
Sorry for my long message, but I’m trying to notify you all available information.

Regards,
Mauro













</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>



_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-28 15:38:52 UTC
Permalink
Thank you, Ashish.

I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?

Many Thanks,
Mauro
Post by Mauro Tridici
------------------------------
*Sent: *Friday, September 28, 2018 7:08:41 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices
(switch 10GbE, cables, NICs, servers ports, and so on), operating systems
version and settings, network bonding configuration, gluster packages
versions, tuning profiles, etc. but everything seems to be ok. The first 3
servers (and volume) operated without problem for one year. After we added
the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or
could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems
that the rebalance is going ahead without errors, but it is only scanning
the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a
new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and
make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully
removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick.
Then use replace-brick command to replace brick of a volume on s05 with
this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the
brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any
other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on
your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even
if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You
should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time
depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of
something which we can not stop.
If free disks are the only concern I will give you some more steps to deal
with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
------------------------------
*Sent: *Friday, September 28, 2018 4:21:03 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you
suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated
in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks
1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after
about 3TB of moved data, rebalance speed slowed down and some transfer
errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete
the step, we decided to stop the remove-brick execution and start it again
(I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a
look to the following output)
s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
s04-stg 0 0Bytes
182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other
suggestion that, in this particular case, could be useful to reduce errors
(I know that they are related to the current volume configuration) and
improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
------------------------------
*Sent: *Thursday, September 27, 2018 4:24:12 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout
option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this
value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should
have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will
have 4 other bricks of that volume and the data on that volume would be
accessible.
In current setup if s04-stg goes down, you will loose all the data on V1
and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
*Approach 1*
We have to remove all the newly added bricks in a set of 6 bricks. This
will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those
bricks again in a set of 6 bricks, this time have 2 bricks from each of the
3 newly added Nodes.
While this is a valid and working approach, I personally think that this
will take long time and also require lot of movement of data.
*Approach 2*
In this approach we can use the heal process. We have to deal with all the
volumes (V1 to V6) one by one. Following are the steps for V1-
*Step 1 - *
Use replace-brick command to move following bricks on *s05-stg* node *one
by one (heal should be completed after every replace brick command)*
*Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>*
*Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is
free>*
gluster v replace-brick <volname> *s04-stg:/gluster/mnt3/brick* *s05-stg:/<brick
which is free>* commit force
Try to give names to the bricks so that you can identify which 6 bricks
belongs to same ec subvolume
Use replace-brick command to move following bricks on *s06-stg* node one
by one
Brick41: s04-stg:/gluster/mnt5/brick to *s06-stg/<brick which is free>*
Brick42: s04-stg:/gluster/mnt6/brick to *s06-stg/<other brick which is
free>*
*Step 2* - After, every replace-brick command, you have to wait for heal
to be completed.
check *"gluster v heal <volname> info "* if it shows any entry you have
to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed.
The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time
and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
------------------------------
*Sent: *Thursday, September 27, 2018 4:03:04 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had
some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici <
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over
here and I don't want to miss anything in hurry?
---
Ashish
------------------------------
*Sent: *Wednesday, September 26, 2018 6:54:19 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated
brick log file (the other files in /var/log/glusterfs/bricks directory seem
to be too old).
I just stopped the running rebalance (as you can see at the bottom of the
rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please
describe me step by step the right procedure to remove the newly added
bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one
of the first 3 nodes (s01, s02, s03) already existing and on one of the
last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation
you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg
the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need
healing and all. In addition to that we have to wait for re-balance to
complete.
I would suggest that if whole data has not been rebalanced and if you can
stop the rebalance and remove these newly added bricks properly then you
should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3
newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then
facing issues in future when it will be almost impossible to correct these
things if you have lots of data.
---
Ashish
------------------------------
*Sent: *Wednesday, September 26, 2018 5:55:02 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure
to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to
provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these
bricks will be of same subvolume, which is not good. Same is true for the
bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of
disruption in connection of these bricks will be higher in this case.
---
Ashish
------------------------------
*Sent: *Wednesday, September 26, 2018 3:38:35 PM
*Subject: *[Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a
rebalance process to distribute data across the bricks, but something goes
wrong.
Rebalance failed on different nodes and the time value needed to complete
the procedure seems to be very high.
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
---------
----------- ----------- ----------- ----------- -----------
------------ --------------
localhost 19 161.6GB
537 2 2 in progress 0:32:23
s02-stg 25 212.7GB
526 5 2 in progress 0:32:25
s03-stg 4 69.1GB
511 0 0 in progress 0:32:25
s04-stg 4 484Bytes
12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes
11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB
8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in
/var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053]
[ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on
2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053]
[ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on
1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv]
0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset
by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035]
[ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation
with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035]
[ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation
with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023]
[dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of
file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result
in dst node (tier2-disperse-5:2440190848) having lower disk space than the
source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b]
) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server
192.168.0.52:49153 has not responded in the last 42 seconds,
disconnecting.
It seems that there are some network or timeout problems, but the network
usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some
volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu
10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
<div style="margin: 0px; line-height: nor
Ashish Pandey
2018-09-28 15:47:46 UTC
Permalink
----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "Gluster Users" <gluster-***@gluster.org>
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Thank you, Ashish.

I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?

Many Thanks,
Mauro


gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.

---
Ashish



Il ven 28 set 2018 16:39 Ashish Pandey < ***@redhat.com > ha scritto:






From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.

At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.

For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.

Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.

Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.

It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------

If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.

Thank you in advance for your help and patience.
Regards,
Mauro




<blockquote>

Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey < ***@redhat.com > ha scritto:


We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.

Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.

Let me know once you think everything is fine with the system and there is nothing to heal.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.

So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)

Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)

[***@s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06

If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?

Thank you in advance,
Mauro


<blockquote>

Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey < ***@redhat.com > ha scritto:


Yes, you can.
If not me others may also reply.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro



<blockquote>

Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

<div style="margin: 0px; line-height: nor




</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-28 15:55:53 UTC
Permalink
I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?

[***@s01 ~]# gluster vol status
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288

Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
Removed bricks:
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress

[***@s01 ~]# ps -ef|grep glusterfs
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/glusterfsd -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/glusterfs/bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/logglusterfs/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs
Post by Ashish Pandey
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.
---
Ashish
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 <http://192.168.0.55:49153/> failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 <http://192.168.0.52:49153/> has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
<div style="margin: 0px; line-height: nor
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Ashish Pandey
2018-10-01 07:17:39 UTC
Permalink
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.

See " cluster.brick-multiplex: on " in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.

So we have to do following step to kill any one brick on a node.

Steps to kill a brick when multiplex is on -

Step - 1
Find unix domain_socket of the process on a node.
Run "ps -aef | grep glusterfsd" on a node. Example :

This is on my machine when I have all the bricks on same machine

[***@apandey glusterfs]# ps -aef | grep glusterfsd | grep -v mnt
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158

Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket

Step - 2
Run following command to kill a brick on the same node -

gf_attach -d <unix domain_socket> brick_path_on_that_node

Example:

gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6

Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787

Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks

[***@apandey glusterfs]#
[***@apandey glusterfs]#
[***@apandey glusterfs]# gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
OK
[***@apandey glusterfs]# gluster v status
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787

Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks


To start a brick we just need to start volume using "force"

gluster v start <volname> force

----
Ashish






----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "Gluster Users" <gluster-***@gluster.org>
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?

[***@s01 ~]# gluster vol status
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288



Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
Removed bricks:
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress

[***@s01 ~]# ps -ef|grep glusterfs
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/ glusterfs d -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/ glusterfs /bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/ glusterfs /glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/ glusterfs /quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/ glusterfs /bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log glusterfs /scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs




Il giorno 28 set 2018, alle ore 17:47, Ashish Pandey < ***@redhat.com > ha scritto:



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "Gluster Users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Thank you, Ashish.

I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?

Many Thanks,
Mauro


gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.

---
Ashish



Il ven 28 set 2018 16:39 Ashish Pandey < ***@redhat.com > ha scritto:

<blockquote>




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.

At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.

For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.

Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.

Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.

It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------

If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.

Thank you in advance for your help and patience.
Regards,
Mauro




<blockquote>

Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey < ***@redhat.com > ha scritto:


We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.

Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.

Let me know once you think everything is fine with the system and there is nothing to heal.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.

So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)

Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)

[***@s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06

If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?

Thank you in advance,
Mauro


<blockquote>

Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey < ***@redhat.com > ha scritto:


Yes, you can.
If not me others may also reply.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro



<blockquote>

Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

<div style="margin: 0px; line-height: nor




</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-10-01 10:05:26 UTC
Permalink
Good morning Ashish,

your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).

I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.

Examples:

a lot of empty directories on /gluster/mnt*/brick/.glusterfs

8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273

some empty files in directories in /gluster/mnt*/brick/*

[***@s04 ~]# cd /gluster/mnt1/brick/
[***@s04 brick]# ls -l
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso

[***@s04 brick]# cd archive_calypso/
[***@s04 archive_calypso]# ll
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
[***@s04 archive_calypso]# du -a *
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5

What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?

As soon as all the bricks will be empty, I plan to re-add the new bricks using the following commands:

gluster peer detach s04
gluster peer detach s05
gluster peer detach s06

gluster peer probe s04
gluster peer probe s05
gluster peer probe s06

gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force

gluster volume rebalance tier2 fix-layout start

gluster volume rebalance tier2 start

From your point of view, are they the right commands to close this repairing task?

Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.
So we have to do following step to kill any one brick on a node.
Steps to kill a brick when multiplex is on -
Step - 1
Find unix domain_socket of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
Step - 2
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288
Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/glusterfsd -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/glusterfs/bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/logglusterfs/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.
---
Ashish
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 <http://192.168.0.55:49153/> failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 <http://192.168.0.52:49153/> has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
<div style="margin: 0px; line-height: nor
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Ashish Pandey
2018-10-01 10:35:45 UTC
Permalink
Hi Mauro,

My comments are inline.
--
Ashish
----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "Gluster Users" <gluster-***@gluster.org>
Sent: Monday, October 1, 2018 3:35:26 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Good morning Ashish,

your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).

I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.

Examples:

a lot of empty directories on /gluster/mnt*/brick/.glusterfs

8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273

some empty files in directories in /gluster/mnt*/brick/*

[***@s04 ~]# cd /gluster/mnt1/brick/
[***@s04 brick]# ls -l
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso

[***@s04 brick]# cd archive_calypso/
[***@s04 archive_calypso]# ll
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
[***@s04 archive_calypso]# du -a *
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5

What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?
As per my understanding, if remove-bricks was successful then this could be deleted. However, I would like to have thoughts from dht team member on this.
@Nithya,
Could you please check if this can be deleted?

-------------------
As soon as all the bricks will be empty, I plan to re-add the new bricks using the following commands:

gluster peer detach s04
gluster peer detach s05
gluster peer detach s06

gluster peer probe s04
gluster peer probe s05
gluster peer probe s06

gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
This command looks good to me. You have given brick path from s04 to s06 one by one which is right thing to do.
However, I would like you to give more meaningful name to bricks associated to one ec subvolume. This will help you in future to find out which brick belongs to which ec sub volume.

For example:
gluster volume add-brick tier2
s04-stg:/gluster/mnt1/brick-13
s05-stg:/gluster/mnt1/brick-13
s06-stg:/gluster/mnt1/brick-13
s04-stg:/gluster/mnt2/brick-13
s05-stg:/gluster/mnt2/brick-13
s06-stg:/gluster/mnt2/brick-13
-13 will indicate that these bricks belongs to 13th ec subvol. You can give any other name also.

s04-stg:/gluster/mnt3/brick-14
s05-stg:/gluster/mnt3/brick-14
s06-stg:/gluster/mnt3/brick-14
s04-stg:/gluster/mnt4/brick-14
s05-stg:/gluster/mnt4/brick-14
s06-stg:/gluster/mnt4/brick-14

gluster volume rebalance tier2 fix-layout start

gluster volume rebalance tier2 start
This looks fine.
From your point of view, are they the right commands to close this repairing task?

Thank you very much for your help.
Regards,
Mauro












Il giorno 01 ott 2018, alle ore 09:17, Ashish Pandey < ***@redhat.com > ha scritto:


Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.

See " cluster.brick-multiplex: on " in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.

So we have to do following step to kill any one brick on a node.

Steps to kill a brick when multiplex is on -

Step - 1
Find unix domain_socket of the process on a node.
Run "ps -aef | grep glusterfsd" on a node. Example :

This is on my machine when I have all the bricks on same machine

[***@apandey glusterfs]# ps -aef | grep glusterfsd | grep -v mnt
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158

Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket

Step - 2
Run following command to kill a brick on the same node -

gf_attach -d <unix domain_socket> brick_path_on_that_node

Example:

gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6

Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787

Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks

[***@apandey glusterfs]#
[***@apandey glusterfs]#
[***@apandey glusterfs]# gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
OK
[***@apandey glusterfs]# gluster v status
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787

Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks


To start a brick we just need to start volume using "force"

gluster v start <volname> force

----
Ashish






----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "Gluster Users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?

[***@s01 ~]# gluster vol status
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288

Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
Removed bricks:
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress

[***@s01 ~]# ps -ef|grep glusterfs
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/ glusterfs d -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/ glusterfs /bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/ glusterfs /glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/ glusterfs /quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/ glusterfs /bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log glusterfs /scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs


<blockquote>

Il giorno 28 set 2018, alle ore 17:47, Ashish Pandey < ***@redhat.com > ha scritto:



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "Gluster Users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Thank you, Ashish.

I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?

Many Thanks,
Mauro


gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.

---
Ashish



Il ven 28 set 2018 16:39 Ashish Pandey < ***@redhat.com > ha scritto:

<blockquote>




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.

At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.

For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.

Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.

Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.

It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------

If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.

Thank you in advance for your help and patience.
Regards,
Mauro




<blockquote>

Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey < ***@redhat.com > ha scritto:


We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.

Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.

Let me know once you think everything is fine with the system and there is nothing to heal.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.

So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)

Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)

[***@s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06

If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?

Thank you in advance,
Mauro


<blockquote>

Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey < ***@redhat.com > ha scritto:


Yes, you can.
If not me others may also reply.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro



<blockquote>

Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

<div style="margin: 0px; line-height: nor




</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>


</blockquote>



_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-10-01 10:56:03 UTC
Permalink
Dear Ashish,

thank you very much.
I will wait for Nithya and DHT team opinion.

I think I will follow your suggestion changing the names of the new bricks.
Unfortunately, I think that I can’t change the names of the old ones, doesn’t it?!

Mauro
Post by Ashish Pandey
Hi Mauro,
My comments are inline.
--
Ashish
Sent: Monday, October 1, 2018 3:35:26 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Good morning Ashish,
your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).
I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.
a lot of empty directories on /gluster/mnt*/brick/.glusterfs
8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273
some empty files in directories in /gluster/mnt*/brick/*
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5
What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?
As per my understanding, if remove-bricks was successful then this could be deleted. However, I would like to have thoughts from dht team member on this.
@Nithya,
Could you please check if this can be deleted?
-------------------
gluster peer detach s04
gluster peer detach s05
gluster peer detach s06
gluster peer probe s04
gluster peer probe s05
gluster peer probe s06
gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
This command looks good to me. You have given brick path from s04 to s06 one by one which is right thing to do.
However, I would like you to give more meaningful name to bricks associated to one ec subvolume. This will help you in future to find out which brick belongs to which ec sub volume.
gluster volume add-brick tier2
s04-stg:/gluster/mnt1/brick-13
s05-stg:/gluster/mnt1/brick-13
s06-stg:/gluster/mnt1/brick-13
s04-stg:/gluster/mnt2/brick-13
s05-stg:/gluster/mnt2/brick-13
s06-stg:/gluster/mnt2/brick-13
-13 will indicate that these bricks belongs to 13th ec subvol. You can give any other name also.
s04-stg:/gluster/mnt3/brick-14
s05-stg:/gluster/mnt3/brick-14
s06-stg:/gluster/mnt3/brick-14
s04-stg:/gluster/mnt4/brick-14
s05-stg:/gluster/mnt4/brick-14
s06-stg:/gluster/mnt4/brick-14
gluster volume rebalance tier2 fix-layout start
gluster volume rebalance tier2 start
This looks fine.
From your point of view, are they the right commands to close this repairing task?
Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.
So we have to do following step to kill any one brick on a node.
Steps to kill a brick when multiplex is on -
Step - 1
Find unix domain_socket of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
Step - 2
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288
Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/glusterfsd -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/glusterfs/bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/logglusterfs/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.
---
Ashish
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.
Now, we can have two approach to correct/modify this setup.
Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.
While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.
Approach 2
In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-
Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)
Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume
Use replace-brick command to move following bricks on s06-stg node one by one
Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>
Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.
After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.
V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick
Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.
---
Ashish
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.
Thank you in advance,
Mauro
Hi Ashish,
sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.
Regards,
Mauro
Hi Mauro,
Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?
---
Ashish
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.
I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?
The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2
As you can see, used space value of each brick of the last servers is about 800GB.
Thank you,
Mauro
Hi Mauro,
rebalance and brick logs should be the first thing we should go through.
There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg, s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.
I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.
Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.
---
Ashish
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?
Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?
Many thanks,
Mauro
I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.
---
Ashish
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear All, Dear Nithya,
after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success
When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log
Error type 1)
[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)
Error type 2)
[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 <http://192.168.0.55:49153/> failed (Connection reset by peer)
Error type 3)
[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.
Error type 4)
W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected
Error type 5)
[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down
Error type 6)
[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 <http://192.168.0.52:49153/> has not responded in the last 42 seconds, disconnecting.
It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
<div style="margin: 0px; line-height: nor
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Ashish Pandey
2018-10-01 11:32:13 UTC
Permalink
----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Ashish Pandey" <***@redhat.com>
Cc: "Gluster Users" <gluster-***@gluster.org>
Sent: Monday, October 1, 2018 4:26:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you very much.
I will wait for Nithya and DHT team opinion.

I think I will follow your suggestion changing the names of the new bricks.
Unfortunately, I think that I can’t change the names of the old ones, doesn’t it?!
. Yeh, I don't think you can change the names.
However, once you are done with this excercise, go through this https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/ and see if it will
work. This is reset-brick command which is mainly used if you want to reset the same brick with same name and path but hostname could be different.

However, i think you should not get into this now.

---
Ashish


Mauro




Il giorno 01 ott 2018, alle ore 12:35, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

My comments are inline.
--
Ashish
----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "Gluster Users" < gluster-***@gluster.org >
Sent: Monday, October 1, 2018 3:35:26 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Good morning Ashish,

your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).

I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.

Examples:

a lot of empty directories on /gluster/mnt*/brick/.glusterfs

8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273

some empty files in directories in /gluster/mnt*/brick/*

[***@s04 ~]# cd /gluster/mnt1/brick/
[***@s04 brick]# ls -l
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso

[***@s04 brick]# cd archive_calypso/
[***@s04 archive_calypso]# ll
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
[***@s04 archive_calypso]# du -a *
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5

What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?
As per my understanding, if remove-bricks was successful then this could be deleted. However, I would like to have thoughts from dht team member on this.
@Nithya,
Could you please check if this can be deleted?

-------------------
As soon as all the bricks will be empty, I plan to re-add the new bricks using the following commands:

gluster peer detach s04
gluster peer detach s05
gluster peer detach s06

gluster peer probe s04
gluster peer probe s05
gluster peer probe s06

gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
This command looks good to me. You have given brick path from s04 to s06 one by one which is right thing to do.
However, I would like you to give more meaningful name to bricks associated to one ec subvolume. This will help you in future to find out which brick belongs to which ec sub volume.

For example:
gluster volume add-brick tier2
s04-stg:/gluster/mnt1/brick-13
s05-stg:/gluster/mnt1/brick-13
s06-stg:/gluster/mnt1/brick-13
s04-stg:/gluster/mnt2/brick-13
s05-stg:/gluster/mnt2/brick-13
s06-stg:/gluster/mnt2/brick-13
-13 will indicate that these bricks belongs to 13th ec subvol. You can give any other name also.

s04-stg:/gluster/mnt3/brick-14
s05-stg:/gluster/mnt3/brick-14
s06-stg:/gluster/mnt3/brick-14
s04-stg:/gluster/mnt4/brick-14
s05-stg:/gluster/mnt4/brick-14
s06-stg:/gluster/mnt4/brick-14

gluster volume rebalance tier2 fix-layout start

gluster volume rebalance tier2 start
This looks fine.
From your point of view, are they the right commands to close this repairing task?

Thank you very much for your help.
Regards,
Mauro










<blockquote>

Il giorno 01 ott 2018, alle ore 09:17, Ashish Pandey < ***@redhat.com > ha scritto:


Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.

See " cluster.brick-multiplex: on " in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.

So we have to do following step to kill any one brick on a node.

Steps to kill a brick when multiplex is on -

Step - 1
Find unix domain_socket of the process on a node.
Run "ps -aef | grep glusterfsd" on a node. Example :

This is on my machine when I have all the bricks on same machine

[***@apandey glusterfs]# ps -aef | grep glusterfsd | grep -v mnt
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158

Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket

Step - 2
Run following command to kill a brick on the same node -

gf_attach -d <unix domain_socket> brick_path_on_that_node

Example:

gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6

Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787

Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks

[***@apandey glusterfs]#
[***@apandey glusterfs]#
[***@apandey glusterfs]# gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
OK
[***@apandey glusterfs]# gluster v status
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787

Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks


To start a brick we just need to start volume using "force"

gluster v start <volname> force

----
Ashish






----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "Gluster Users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?

[***@s01 ~]# gluster vol status
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288

Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
Removed bricks:
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress

[***@s01 ~]# ps -ef|grep glusterfs
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/ glusterfs d -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/ glusterfs /bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/ glusterfs /glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/ glusterfs /quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/ glusterfs /bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/ glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log glusterfs /scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs


<blockquote>

Il giorno 28 set 2018, alle ore 17:47, Ashish Pandey < ***@redhat.com > ha scritto:



----- Original Message -----

From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "Gluster Users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Thank you, Ashish.

I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?

Many Thanks,
Mauro


gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.

---
Ashish



Il ven 28 set 2018 16:39 Ashish Pandey < ***@redhat.com > ha scritto:

<blockquote>




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.

At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.

For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.

Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.

Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.

It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------

If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.

Thank you in advance for your help and patience.
Regards,
Mauro




<blockquote>

Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey < ***@redhat.com > ha scritto:


We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.

Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.

Let me know once you think everything is fine with the system and there is nothing to heal.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.

So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)

Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)

[***@s01 ~]# gluster volume remove-brick tier2 s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06

If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?

Thank you in advance,
Mauro


<blockquote>

Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey < ***@redhat.com > ha scritto:


Yes, you can.
If not me others may also reply.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.

Could I contact you again if I need some kind of suggestion?

Thank you very much again.
Have a good day,
Mauro



<blockquote>

Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey < ***@redhat.com > ha scritto:


Hi Mauro,

We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.

Problem:
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume and the data on that volume would be accessible.
In current setup if s04-stg goes down, you will loose all the data on V1 and V2 as all the bricks will be down. We want to avoid and correct it.

Now, we can have two approach to correct/modify this setup.

Approach 1
We have to remove all the newly added bricks in a set of 6 bricks. This will trigger re- balance and move whole data to other sub volumes.
Repeat the above step and then once all the bricks are removed, add those bricks again in a set of 6 bricks, this time have 2 bricks from each of the 3 newly added Nodes.

While this is a valid and working approach, I personally think that this will take long time and also require lot of movement of data.

Approach 2

In this approach we can use the heal process. We have to deal with all the volumes (V1 to V6) one by one. Following are the steps for V1-

Step 1 -
Use replace-brick command to move following bricks on s05-stg node one by one (heal should be completed after every replace brick command)

Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>
Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is free>

Command :
gluster v replace-brick <volname> s04-stg:/gluster/mnt3/brick s05-stg:/<brick which is free> commit force
Try to give names to the bricks so that you can identify which 6 bricks belongs to same ec subvolume


Use replace-brick command to move following bricks on s06-stg node one by one

Brick41: s04-stg:/gluster/mnt5/brick to s06-stg/<brick which is free>
Brick42: s04-stg:/gluster/mnt6/brick to s06-stg/<other brick which is free>


Step 2 - After, every replace-brick command, you have to wait for heal to be completed.
check "gluster v heal <volname> info " if it shows any entry you have to wait for it to be completed.

After successful step 1 and step 2, setup for sub volume V1 will be fixed. The same steps you have to perform for other volumes. Only thing is that
the nodes would be different on which you have to move the bricks.




V1
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
V2
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick
V3
Brick49: s05-stg:/gluster/mnt1/brick
Brick50: s05-stg:/gluster/mnt2/brick
Brick51: s05-stg:/gluster/mnt3/brick
Brick52: s05-stg:/gluster/mnt4/brick
Brick53: s05-stg:/gluster/mnt5/brick
Brick54: s05-stg:/gluster/mnt6/brick
V4
Brick55: s05-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt8/brick
Brick57: s05-stg:/gluster/mnt9/brick
Brick58: s05-stg:/gluster/mnt10/brick
Brick59: s05-stg:/gluster/mnt11/brick
Brick60: s05-stg:/gluster/mnt12/brick
V5
Brick61: s06-stg:/gluster/mnt1/brick
Brick62: s06-stg:/gluster/mnt2/brick
Brick63: s06-stg:/gluster/mnt3/brick
Brick64: s06-stg:/gluster/mnt4/brick
Brick65: s06-stg:/gluster/mnt5/brick
Brick66: s06-stg:/gluster/mnt6/brick
V6
Brick67: s06-stg:/gluster/mnt7/brick
Brick68: s06-stg:/gluster/mnt8/brick
Brick69: s06-stg:/gluster/mnt9/brick
Brick70: s06-stg:/gluster/mnt10/brick
Brick71: s06-stg:/gluster/mnt11/brick
Brick72: s06-stg:/gluster/mnt12/brick

Just a note that these steps need movement of data.
Be careful while performing these steps and do one replace brick at a time and only after heal completion go to next.
Let me know if you have any issues.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Thursday, September 27, 2018 4:03:04 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

I hope I don’t disturb you so much, but I would like to ask you if you had some time to dedicate to our problem.
Please, forgive my insistence.

Thank you in advance,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < ***@cmcc.it > ha scritto:

Hi Ashish,

sure, no problem! We are a little bit worried, but we can wait :-)
Thank you very much for your support and your availability.

Regards,
Mauro



<blockquote>

Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, I can provide you step by step procedure to correct it.
Is it fine If i provide you the steps tomorrow as it is quite late over here and I don't want to miss anything in hurry?

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 6:54:19 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Hi Ashish,

in attachment you can find the rebalance log file and the last updated brick log file (the other files in /var/log/glusterfs/bricks directory seem to be too old).
I just stopped the running rebalance (as you can see at the bottom of the rebalance log file).
So, if exists a safe procedure to correct the problem I would like execute it.

I don’t know if I can ask you it, but, if it is possible, could you please describe me step by step the right procedure to remove the newly added bricks without losing the data that have been already rebalanced?

The following outputs show the result of “df -h” command executed on one of the first 3 nodes (s01, s02, s03) already existing and on one of the last 3 nodes (s04, s05, s06) added recently.

[***@s06 bricks]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12
tmpfs 6,3G 0 6,3G 0% /run/user/0

[***@s01 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s01-root 100G 5,3G 95G 6% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 39M 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s01-var 100G 11G 90G 11% /var
/dev/md127 1015M 151M 865M 15% /boot
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
tmpfs 6,3G 0 6,3G 0% /run/user/0
s01-stg:tier2 420T 159T 262T 38% /tier2

As you can see, used space value of each brick of the last servers is about 800GB.

Thank you,
Mauro









<blockquote>

Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey < ***@redhat.com > ha scritto:

Hi Mauro,

rebalance and brick logs should be the first thing we should go through.

There is a procedure to correct the configuration/setup but the situation you are in is difficult to follow that procedure.
You should have added the bricks hosted on s04-stg , s05-stg and s06-stg the same way you had the previous configuration.
That means 2 bricks on each node for one subvolume.
The procedure will require a lot of replace bricks which will again need healing and all. In addition to that we have to wait for re-balance to complete.

I would suggest that if whole data has not been rebalanced and if you can stop the rebalance and remove these newly added bricks properly then you should remove these newly added bricks.
After that, add these bricks so that you have 2 bricks of each volume on 3 newly added nodes.

Yes, it is like undoing whole effort but it is better to do it now then facing issues in future when it will be almost impossible to correct these things if you have lots of data.

---
Ashish




From: "Mauro Tridici" < ***@cmcc.it >
To: "Ashish Pandey" < ***@redhat.com >
Cc: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 5:55:02 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version


Dear Ashish,

thank you for you answer.
I could provide you the entire log file related to glusterd, glusterfsd and rebalance.
Please, could you indicate which one you need first?

Yes, we added the last 36 bricks after creating vol. Is there a procedure to correct this error? Is it still possible to do it?

Many thanks,
Mauro


<blockquote>

Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey < ***@redhat.com > ha scritto:


I think we don't have enough logs to debug this so I would suggest you to provide more logs/info.
I have also observed that the configuration and setup of your volume is not very efficient.

For example:
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s04-stg:/gluster/mnt2/brick
Brick39: s04-stg:/gluster/mnt3/brick
Brick40: s04-stg:/gluster/mnt4/brick
Brick41: s04-stg:/gluster/mnt5/brick
Brick42: s04-stg:/gluster/mnt6/brick
Brick43: s04-stg:/gluster/mnt7/brick
Brick44: s04-stg:/gluster/mnt8/brick
Brick45: s04-stg:/gluster/mnt9/brick
Brick46: s04-stg:/gluster/mnt10/brick
Brick47: s04-stg:/gluster/mnt11/brick
Brick48: s04-stg:/gluster/mnt12/brick

These 12 bricks are on same node and the sub volume made up of these bricks will be of same subvolume, which is not good. Same is true for the bricks hosted on s05-stg and s06-stg
I think you have added these bricks after creating vol. The probability of disruption in connection of these bricks will be higher in this case.

---
Ashish


From: "Mauro Tridici" < ***@cmcc.it >
To: "gluster-users" < gluster-***@gluster.org >
Sent: Wednesday, September 26, 2018 3:38:35 PM
Subject: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Dear All, Dear Nithya,

after upgrading from 3.10.5 version to 3.12.14, I tried to start a rebalance process to distribute data across the bricks, but something goes wrong.
Rebalance failed on different nodes and the time value needed to complete the procedure seems to be very high.

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 19 161.6GB 537 2 2 in progress 0:32:23
s02-stg 25 212.7GB 526 5 2 in progress 0:32:25
s03-stg 4 69.1GB 511 0 0 in progress 0:32:25
s04-stg 4 484Bytes 12283 0 3 in progress 0:32:25
s05-stg 23 484Bytes 11049 0 10 in progress 0:32:25
s06-stg 3 1.2GB 8032 11 3 failed 0:17:57
Estimated time left for rebalance to complete : 3601:05:41
volume rebalance: tier2: success

When rebalance processes fail, I can see the following kind of errors in /var/log/glusterfs/tier2-rebalance.log

Error type 1)

[2018-09-26 08:50:19.872575] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
000000, good=100111, bad=011000)
[2018-09-26 08:50:19.901792] W [MSGID: 122053] [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
000000, good=111101, bad=000010)

Error type 2)

[2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset by peer)

Error type 3)

[2018-09-26 08:57:37.852590] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 08:57:39.282306] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (10)
[2018-09-26 09:02:04.928408] W [MSGID: 109023] [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of file {blocks:0 name:(/OPA/archive/historical/dts/MRE
A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result in dst node (tier2-disperse-5:2440190848) having lower disk space than the source node (tier2-dispers
e-11:71373083776).Skipping file.

Error type 4)

W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket disconnected

Error type 5)

[2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] ) 0-: received signum (15), shutting down

Error type 6)

[2018-09-25 08:09:18.340658] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting.

It seems that there are some network or timeout problems, but the network usage/traffic values are not so high.
Do you think that, in my volume configuration, I have to modify some volume options related to thread and/or network parameters?
Could you, please, help me to understand the cause of the problems above?

You can find below our volume info:
(volume is implemented on 6 servers; each server configuration: 2 cpu 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)

[***@s04 ~]# gluster vol info

<div style="margin: 0px; line-height: nor




</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>


</blockquote>



_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


</blockquote>



-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
https://it.linkedin.com/in/mauro-tridici-5977238b


_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-10-03 14:49:13 UTC
Permalink
Post by Mauro Tridici
Good morning Ashish,
your explanations are always very useful, thank you very much: I will
remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended
successfully and we were able to free up all bricks defined on server s04,
s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete
this first phase (removing of all bricks defined on s04, s05 and s06
servers).
I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed
that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long
list of directories names and some files that are still on the brick, but
respective size is 0.
a lot of empty directories on /gluster/mnt*/brick/.glusterfs
8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-
4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-
43a7-a89a-69c728ee8273
some empty files in directories in /gluster/mnt*/brick/*
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 *archive_calypso*
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 *ans002*
drwxr-x--- 3 5104 5100 32 11 set 22.14 *ans004*
drwxr-x--- 3 4506 4500 31 11 set 22.14 *ans006*
drwxr-x--- 3 4515 4500 28 11 set 22.14 *ans015*
drwxr-x--- 4 4321 4300 54 11 set 22.14 *ans021*
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/
19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/
19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/
19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5
What we have to do with this data? Should I backup this “empty” dirs and
files on a different storage before deleting them?
Hi Mauro,

Are you sure these files and directories are empty? Please provide the ls
-l output for the files. If they are 'T' files , they can be ignored.

Regards,
Nithya
Post by Mauro Tridici
As soon as all the bricks will be empty, I plan to re-add the new bricks
*gluster peer detach s04*
*gluster peer detach s05*
*gluster peer detach s06*
*gluster peer probe s04*
*gluster peer probe s05*
*gluster peer probe s06*
*gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick
s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick
s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick
s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick
s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick
s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick
s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick
s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick
s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick
s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick
s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick
s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick
s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick
s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick
s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick
s06-stg:/gluster/mnt12/brick force*
*gluster volume rebalance tier2 fix-layout start*
*gluster volume rebalance tier2 start*
From your point of view, are they the right commands to close this repairing task?
Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if
it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all
the bricks on a Node.
So we have to do following step to kill any one brick on a node.
*Steps to kill a brick when multiplex is on -*
*Step - 1 *
Find *unix domain_socket* of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd
-s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p
/var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid
-S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name
/home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/
home-apandey-bricks-gluster-vol-1.log --xlator-option
*-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name
brick --brick-port 49158 --xlator-option vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
*Step - 2*
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
*gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket
/home/apandey/bricks/gluster/vol-6*
Status of volume: vol
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------
------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y
28311
Self-heal Daemon on localhost N/A N/A Y
29787
Task Status of Volume vol
------------------------------------------------------------
------------------
There are no active volume tasks
/home/apandey/bricks/gluster/vol-6
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------
------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
29787
Task Status of Volume vol
------------------------------------------------------------
------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
------------------------------
*Sent: *Friday, September 28, 2018 9:25:53 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that
more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------
------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y
3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y
3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y
3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y
3644
Self-heal Daemon on localhost N/A N/A Y
79376
Quota Daemon on localhost N/A N/A Y
79472
Bitrot Daemon on localhost N/A N/A Y
79485
Scrubber Daemon on localhost N/A N/A Y
79505
Self-heal Daemon on s03-stg N/A N/A Y
77073
Quota Daemon on s03-stg N/A N/A Y
77148
Bitrot Daemon on s03-stg N/A N/A Y
77160
Scrubber Daemon on s03-stg N/A N/A Y
77191
Self-heal Daemon on s02-stg N/A N/A Y
80150
Quota Daemon on s02-stg N/A N/A Y
80226
Bitrot Daemon on s02-stg N/A N/A Y
80238
Scrubber Daemon on s02-stg N/A N/A Y
80269
Self-heal Daemon on s04-stg N/A N/A Y
106815
Quota Daemon on s04-stg N/A N/A Y
106866
Bitrot Daemon on s04-stg N/A N/A Y
106878
Scrubber Daemon on s04-stg N/A N/A Y
106897
Self-heal Daemon on s05-stg N/A N/A Y
130807
Quota Daemon on s05-stg N/A N/A Y
130884
Bitrot Daemon on s05-stg N/A N/A Y
130896
Scrubber Daemon on s05-stg N/A N/A Y
130927
Self-heal Daemon on s06-stg N/A N/A Y
157146
Quota Daemon on s06-stg N/A N/A Y
157239
Bitrot Daemon on s06-stg N/A N/A Y
157252
Scrubber Daemon on s06-stg N/A N/A Y
157288
Task Status of Volume tier2
------------------------------------------------------------
------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/*glusterfs*d
-s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p
/var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S
/var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name
/gluster/mnt1/brick -l /var/log/*glusterfs*/bricks/gluster-mnt1-brick.log
--xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee
--brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/*glusterfs*/glustershd.log -S /var/run/gluster/
4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option
*replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid
-l /var/log/*glusterfs*/quotad.log -S /var/run/gluster/
958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option
*replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off
--xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l
/var/log/*glusterfs*/bitd.log -S /var/run/gluster/
b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid
-l /var/log*glusterfs*/scrub.log -S /var/run/gluster/
ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto
*glusterfs*
------------------------------
*Sent: *Friday, September 28, 2018 9:08:52 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a
node but I think the above step also do the same.
---
Ashish
Post by Mauro Tridici
------------------------------
*Sent: *Friday, September 28, 2018 7:08:41 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices
(switch 10GbE, cables, NICs, servers ports, and so on), operating systems
version and settings, network bonding configuration, gluster packages
versions, tuning profiles, etc. but everything seems to be ok. The first 3
servers (and volume) operated without problem for one year. After we added
the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or
could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems
that the rebalance is going ahead without errors, but it is only scanning
the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a
new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and
make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully
removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick.
Then use replace-brick command to replace brick of a volume on s05 with
this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the
brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying
any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on
your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2
even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You
should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time
depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of
something which we can not stop.
If free disks are the only concern I will give you some more steps to
deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there
is nothing to heal.
---
Ashish
------------------------------
*Sent: *Friday, September 28, 2018 4:21:03 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you
suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated
in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks
1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after
about 3TB of moved data, rebalance speed slowed down and some transfer
errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to
complete the step, we decided to stop the remove-brick execution and start
it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take
a look to the following output)
s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
s04-stg 0 0Bytes
182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other
suggestion that, in this particular case, could be useful to reduce errors
(I know that they are related to the current volume configuration) and
improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
------------------------------
*Sent: *Thursday, September 27, 2018 4:24:12 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout
option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this
value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each
starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6
different nodes.
However, in your case you have added 3 new nodes. So, at least we should
have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will
have 4 other bricks of that volume
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
...
[Message clipped]
Mauro Tridici
2018-10-03 15:48:02 UTC
Permalink
Hi Nithya,

in order to give an answer to your question as soon as possible, I just considered only the content of one brick of server s06 (in attachment you can find the content of /gluster/mnt1/brick).

[***@s06 ~]# df -h
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 106M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 3,0G 97G 3% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 12G 9,0T 1% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 12G 9,0T 1% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 12G 9,0T 1% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 12G 9,0T 1% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 1,4T 7,7T 16% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 12G 9,0T 1% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 1,4T 7,7T 16% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 12G 9,0T 1% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 1,4T 7,7T 16% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 1,4T 7,7T 16% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 1,4T 7,7T 16% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 1,4T 7,7T 16% /gluster/mnt12

The scenario is almost the same for all the bricks removed from server s04, s05 and s06.
In the next hours, I will check every files on each removed bricks.

So, if I understand, I can proceed with deletion of directories and files left on the bricks only if each file have T tag, right?

Thank you in advance,
Mauro
Post by Mauro Tridici
Good morning Ashish,
your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).
I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.
a lot of empty directories on /gluster/mnt*/brick/.glusterfs
8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273
some empty files in directories in /gluster/mnt*/brick/*
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5
What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?
Hi Mauro,
Are you sure these files and directories are empty? Please provide the ls -l output for the files. If they are 'T' files , they can be ignored.
Regards,
Nithya
gluster peer detach s04
gluster peer detach s05
gluster peer detach s06
gluster peer probe s04
gluster peer probe s05
gluster peer probe s06
gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
gluster volume rebalance tier2 fix-layout start
gluster volume rebalance tier2 start
From your point of view, are they the right commands to close this repairing task?
Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.
So we have to do following step to kill any one brick on a node.
Steps to kill a brick when multiplex is on -
Step - 1
Find unix domain_socket of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
Step - 2
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288
Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/glusterfsd -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/glusterfs/bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/logglusterfs/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.
---
Ashish
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
...
[Message clipped]
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Mauro Tridici
2018-10-03 21:12:53 UTC
Permalink
Hi Nithya,

I created and executed the following simple script in order to check each brick content.

---
#!/bin/bash

for i in {1..12}
do

# ls -lR /gluster/mnt$i/brick/ > $HOSTNAME.brick$i.txt
find /gluster/mnt$i/brick -type f -print0|xargs -0r ls -l > $HOSTNAME.brick$i.txt
wc -l $HOSTNAME.brick$i.txt >> report.txt
grep -v '\-\-T' $HOSTNAME.brick$i.txt >> report.txt

done
—

It scans all files left on the bricks and save, for each brick, the “ls -l” output to separated log files (named s04.brick#.txt).
Moreover, the bash script creates a report file (report.txt) to collect all file without “- - T” tag.

[***@s04 left]# ll
totale 557236
-rwxr--r-- 1 root root 273 3 ott 22.45 check
-rw------- 1 root root 0 3 ott 22.46 nohup.out
-rw-r--r-- 1 root root 7581 3 ott 22.49 report.txt
-rw-r--r-- 1 root root 44801236 3 ott 22.48 s04.brick10.txt
-rw-r--r-- 1 root root 44801236 3 ott 22.49 s04.brick11.txt
-rw-r--r-- 1 root root 44801236 3 ott 22.49 s04.brick12.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.46 s04.brick1.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.46 s04.brick2.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick3.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick4.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick5.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick6.txt
-rw-r--r-- 1 root root 44474106 3 ott 22.48 s04.brick7.txt
-rw-r--r-- 1 root root 44474106 3 ott 22.48 s04.brick8.txt
-rw-r--r-- 1 root root 44474106 3 ott 22.48 s04.brick9.txt

So, at the end of the script execution, I obtained that:

- s04 server bricks don’t contain files without “ - - T” tag except for the following files (I think I can delete them, right?)

-rw-r--r-- 1 root root 4096 11 set 11.22 /gluster/mnt12/brick/.glusterfs/brick.db
-rw-r--r-- 1 root root 32768 16 set 03.21 /gluster/mnt12/brick/.glusterfs/brick.db-shm
-rw-r--r-- 1 root root 20632 11 set 11.22 /gluster/mnt12/brick/.glusterfs/brick.db-wal
-rw-r--r-- 1 root root 19 29 set 15.14 /gluster/mnt12/brick/.glusterfs/health_check
---------- 1 root root 0 29 set 00.05 /gluster/mnt12/brick/.glusterfs/indices/xattrop/xattrop-9040d2ea-6acb-42c2-b515-0a44380e60d8
---------- 1 root root 0 11 set 11.22 /gluster/mnt12/brick/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008

- s05 server bricks don’t contain files without “ - - T” tag except for the following files:

-rw-r--r-- 1 root root 4096 11 set 11.22 /gluster/mnt8/brick/.glusterfs/brick.db
-rw-r--r-- 1 root root 32768 16 set 03.19 /gluster/mnt8/brick/.glusterfs/brick.db-shm
-rw-r--r-- 1 root root 20632 11 set 11.22 /gluster/mnt8/brick/.glusterfs/brick.db-wal
-rw-r--r-- 1 root root 19 1 ott 07.30 /gluster/mnt8/brick/.glusterfs/health_check
---------- 1 root root 0 30 set 16.42 /gluster/mnt8/brick/.glusterfs/indices/xattrop/xattrop-9db3d840-35e0-4359-8d7a-14d305760247
---------- 1 root root 0 11 set 11.22 /gluster/mnt8/brick/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008

- s06 server bricks HAVE some files that I think are important. This is the files list:

-rw-r--r-- 2 5219 5200 519226880 14 set 17.29 /gluster/mnt6/brick/.glusterfs/ef/87/ef870cb8-03be-45c8-8b72-38941f08b8a5
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/.glusterfs/ef/98/ef98b463-3a0a-46a2-ad18-37149d4dd65c
-rw-r--r-- 2 5219 5200 3164160 23 apr 2016 /gluster/mnt6/brick/.glusterfs/a4/25/a4255b8e-de1f-4acc-a5cf-d47ac7767d46
-rw-r--r-- 2 12001 12000 0 12 ago 05.06 /gluster/mnt6/brick/.glusterfs/a4/52/a4520383-eaa1-4c82-a6f0-d9ea7de4c48d
---------- 1 root root 0 11 set 11.22 /gluster/mnt6/brick/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008
-rw-r--r-- 2 12001 12000 0 23 lug 22.22 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19860823/COST.DAT
-rw-r--r-- 2 12001 12000 0 24 lug 00.28 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19860828/COST.DAT
-rw-r--r-- 2 12001 12000 0 24 lug 08.27 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19860916/COST.DAT
-rw-r--r-- 2 12001 12000 0 26 lug 00.50 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19861221/COST.DAT
-rw-r--r-- 2 12001 12000 0 26 lug 03.23 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19861230/COST.DAT
-rw-r--r-- 2 12001 12000 0 7 ago 15.21 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1987_simu/model/wind/in/procday.20180101
-rw-r--r-- 2 12001 12000 0 7 ago 15.22 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1987_simu/model/wind/work/err.log
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/assim/no_obs_19881231
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/in/endday.19881231
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/in/procday.20180101
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/in/startday.19881230
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/work/err.log
-rw-r--r-- 2 12001 12000 0 8 ago 13.51 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/prepobs_19880114/COST.DAT
-rw-r--r-- 2 5219 5200 844800 22 giu 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_001/atm/hist/postproc/sps_199301_001.cam.h0.1993-03_grid.nc
-rw-r--r-- 2 5219 5200 3203072 22 apr 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_001/lnd/hist/lnd/hist/sps_199301_001.clm2.h0.1993-03.nc.gz
-rw-r--r-- 2 5219 5200 3164160 23 apr 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_001/lnd/hist/lnd/hist/sps_199301_001.clm2.h0.1993-05.nc.gz
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_002/atm/hist/postproc/sps_199301_002.cam.h0.1993-01_grid.nc
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_002/atm/hist/postproc/sps_199301_002.cam.h0.1993-05_grid.nc
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_002/atm/hist/postproc/sps_199301_002.cam.h0.1993-06_grid.nc
-rw-r--r-- 2 5219 5200 844800 22 giu 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_003/atm/hist/postproc/sps_199301_003.cam.h0.1993-03_grid.nc

What can I do with the files not moved by rebalance?

Thank you,
Mauro
Post by Mauro Tridici
Hi Nithya,
in order to give an answer to your question as soon as possible, I just considered only the content of one brick of server s06 (in attachment you can find the content of /gluster/mnt1/brick).
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 106M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 3,0G 97G 3% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 12G 9,0T 1% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 12G 9,0T 1% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 12G 9,0T 1% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 12G 9,0T 1% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 1,4T 7,7T 16% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 12G 9,0T 1% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 1,4T 7,7T 16% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 12G 9,0T 1% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 1,4T 7,7T 16% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 1,4T 7,7T 16% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 1,4T 7,7T 16% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 1,4T 7,7T 16% /gluster/mnt12
The scenario is almost the same for all the bricks removed from server s04, s05 and s06.
In the next hours, I will check every files on each removed bricks.
So, if I understand, I can proceed with deletion of directories and files left on the bricks only if each file have T tag, right?
Thank you in advance,
Mauro
<ls-l_on_a_brick.txt.gz>
Post by Mauro Tridici
Good morning Ashish,
your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).
I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.
a lot of empty directories on /gluster/mnt*/brick/.glusterfs
8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273
some empty files in directories in /gluster/mnt*/brick/*
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5
What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?
Hi Mauro,
Are you sure these files and directories are empty? Please provide the ls -l output for the files. If they are 'T' files , they can be ignored.
Regards,
Nithya
gluster peer detach s04
gluster peer detach s05
gluster peer detach s06
gluster peer probe s04
gluster peer probe s05
gluster peer probe s06
gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
gluster volume rebalance tier2 fix-layout start
gluster volume rebalance tier2 start
From your point of view, are they the right commands to close this repairing task?
Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.
So we have to do following step to kill any one brick on a node.
Steps to kill a brick when multiplex is on -
Step - 1
Find unix domain_socket of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
Step - 2
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288
Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/glusterfsd -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/glusterfs/bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/logglusterfs/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.
---
Ashish
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
...
[Message clipped]
--------
Mauro Tridici
2018-10-03 22:09:57 UTC
Permalink
I forgot to add here the report.txt file for the 6 bricks of server s06.
It contains only the files left on bricks without the “- - T” tag.

Thanks,
Mauro
Post by Mauro Tridici
Hi Nithya,
I created and executed the following simple script in order to check each brick content.
---
#!/bin/bash
for i in {1..12}
do
# ls -lR /gluster/mnt$i/brick/ > $HOSTNAME.brick$i.txt
find /gluster/mnt$i/brick -type f -print0|xargs -0r ls -l > $HOSTNAME.brick$i.txt
wc -l $HOSTNAME.brick$i.txt >> report.txt
grep -v '\-\-T' $HOSTNAME.brick$i.txt >> report.txt
done
—
It scans all files left on the bricks and save, for each brick, the “ls -l” output to separated log files (named s04.brick#.txt).
Moreover, the bash script creates a report file (report.txt) to collect all file without “- - T” tag.
totale 557236
-rwxr--r-- 1 root root 273 3 ott 22.45 check
-rw------- 1 root root 0 3 ott 22.46 nohup.out
-rw-r--r-- 1 root root 7581 3 ott 22.49 report.txt
-rw-r--r-- 1 root root 44801236 3 ott 22.48 s04.brick10.txt
-rw-r--r-- 1 root root 44801236 3 ott 22.49 s04.brick11.txt
-rw-r--r-- 1 root root 44801236 3 ott 22.49 s04.brick12.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.46 s04.brick1.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.46 s04.brick2.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick3.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick4.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick5.txt
-rw-r--r-- 1 root root 45007600 3 ott 22.47 s04.brick6.txt
-rw-r--r-- 1 root root 44474106 3 ott 22.48 s04.brick7.txt
-rw-r--r-- 1 root root 44474106 3 ott 22.48 s04.brick8.txt
-rw-r--r-- 1 root root 44474106 3 ott 22.48 s04.brick9.txt
- s04 server bricks don’t contain files without “ - - T” tag except for the following files (I think I can delete them, right?)
-rw-r--r-- 1 root root 4096 11 set 11.22 /gluster/mnt12/brick/.glusterfs/brick.db
-rw-r--r-- 1 root root 32768 16 set 03.21 /gluster/mnt12/brick/.glusterfs/brick.db-shm
-rw-r--r-- 1 root root 20632 11 set 11.22 /gluster/mnt12/brick/.glusterfs/brick.db-wal
-rw-r--r-- 1 root root 19 29 set 15.14 /gluster/mnt12/brick/.glusterfs/health_check
---------- 1 root root 0 29 set 00.05 /gluster/mnt12/brick/.glusterfs/indices/xattrop/xattrop-9040d2ea-6acb-42c2-b515-0a44380e60d8
---------- 1 root root 0 11 set 11.22 /gluster/mnt12/brick/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008
-rw-r--r-- 1 root root 4096 11 set 11.22 /gluster/mnt8/brick/.glusterfs/brick.db
-rw-r--r-- 1 root root 32768 16 set 03.19 /gluster/mnt8/brick/.glusterfs/brick.db-shm
-rw-r--r-- 1 root root 20632 11 set 11.22 /gluster/mnt8/brick/.glusterfs/brick.db-wal
-rw-r--r-- 1 root root 19 1 ott 07.30 /gluster/mnt8/brick/.glusterfs/health_check
---------- 1 root root 0 30 set 16.42 /gluster/mnt8/brick/.glusterfs/indices/xattrop/xattrop-9db3d840-35e0-4359-8d7a-14d305760247
---------- 1 root root 0 11 set 11.22 /gluster/mnt8/brick/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008
-rw-r--r-- 2 5219 5200 519226880 14 set 17.29 /gluster/mnt6/brick/.glusterfs/ef/87/ef870cb8-03be-45c8-8b72-38941f08b8a5
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/.glusterfs/ef/98/ef98b463-3a0a-46a2-ad18-37149d4dd65c
-rw-r--r-- 2 5219 5200 3164160 23 apr 2016 /gluster/mnt6/brick/.glusterfs/a4/25/a4255b8e-de1f-4acc-a5cf-d47ac7767d46
-rw-r--r-- 2 12001 12000 0 12 ago 05.06 /gluster/mnt6/brick/.glusterfs/a4/52/a4520383-eaa1-4c82-a6f0-d9ea7de4c48d
---------- 1 root root 0 11 set 11.22 /gluster/mnt6/brick/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008
-rw-r--r-- 2 12001 12000 0 23 lug 22.22 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19860823/COST.DAT
-rw-r--r-- 2 12001 12000 0 24 lug 00.28 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19860828/COST.DAT
-rw-r--r-- 2 12001 12000 0 24 lug 08.27 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19860916/COST.DAT
-rw-r--r-- 2 12001 12000 0 26 lug 00.50 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19861221/COST.DAT
-rw-r--r-- 2 12001 12000 0 26 lug 03.23 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1986_simu/prepobs_19861230/COST.DAT
-rw-r--r-- 2 12001 12000 0 7 ago 15.21 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1987_simu/model/wind/in/procday.20180101
-rw-r--r-- 2 12001 12000 0 7 ago 15.22 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1987_simu/model/wind/work/err.log
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/assim/no_obs_19881231
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/in/endday.19881231
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/in/procday.20180101
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/in/startday.19881230
-rw-r--r-- 2 12001 12000 0 14 ago 12.55 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/model/wind/work/err.log
-rw-r--r-- 2 12001 12000 0 8 ago 13.51 /gluster/mnt6/brick/OPA/tessa01/work/REA_exp/rea_1988_simu/prepobs_19880114/COST.DAT
-rw-r--r-- 2 5219 5200 844800 22 giu 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_001/atm/hist/postproc/sps_199301_001.cam.h0.1993-03_grid.nc
-rw-r--r-- 2 5219 5200 3203072 22 apr 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_001/lnd/hist/lnd/hist/sps_199301_001.clm2.h0.1993-03.nc.gz
-rw-r--r-- 2 5219 5200 3164160 23 apr 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_001/lnd/hist/lnd/hist/sps_199301_001.clm2.h0.1993-05.nc.gz
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_002/atm/hist/postproc/sps_199301_002.cam.h0.1993-01_grid.nc
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_002/atm/hist/postproc/sps_199301_002.cam.h0.1993-05_grid.nc
-rw-r--r-- 2 5219 5200 844800 17 gen 2017 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_002/atm/hist/postproc/sps_199301_002.cam.h0.1993-06_grid.nc
-rw-r--r-- 2 5219 5200 844800 22 giu 2016 /gluster/mnt6/brick/CSP/sp1/CESM/archive/sps_199301_003/atm/hist/postproc/sps_199301_003.cam.h0.1993-03_grid.nc
What can I do with the files not moved by rebalance?
Thank you,
Mauro
Post by Mauro Tridici
Hi Nithya,
in order to give an answer to your question as soon as possible, I just considered only the content of one brick of server s06 (in attachment you can find the content of /gluster/mnt1/brick).
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 106M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 3,0G 97G 3% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 12G 9,0T 1% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 12G 9,0T 1% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 12G 9,0T 1% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 12G 9,0T 1% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 1,4T 7,7T 16% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 12G 9,0T 1% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 1,4T 7,7T 16% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 12G 9,0T 1% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 1,4T 7,7T 16% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 1,4T 7,7T 16% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 1,4T 7,7T 16% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 1,4T 7,7T 16% /gluster/mnt12
The scenario is almost the same for all the bricks removed from server s04, s05 and s06.
In the next hours, I will check every files on each removed bricks.
So, if I understand, I can proceed with deletion of directories and files left on the bricks only if each file have T tag, right?
Thank you in advance,
Mauro
<ls-l_on_a_brick.txt.gz>
Post by Mauro Tridici
Good morning Ashish,
your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).
I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.
a lot of empty directories on /gluster/mnt*/brick/.glusterfs
8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273
some empty files in directories in /gluster/mnt*/brick/*
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5
What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?
Hi Mauro,
Are you sure these files and directories are empty? Please provide the ls -l output for the files. If they are 'T' files , they can be ignored.
Regards,
Nithya
gluster peer detach s04
gluster peer detach s05
gluster peer detach s06
gluster peer probe s04
gluster peer probe s05
gluster peer probe s06
gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
gluster volume rebalance tier2 fix-layout start
gluster volume rebalance tier2 start
From your point of view, are they the right commands to close this repairing task?
Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.
So we have to do following step to kill any one brick on a node.
Steps to kill a brick when multiplex is on -
Step - 1
Find unix domain_socket of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
Step - 2
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288
Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/glusterfsd -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/glusterfs/bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/logglusterfs/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.
---
Ashish
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
...
[Message clipped]
--------
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Nithya Balachandran
2018-10-04 04:05:14 UTC
Permalink
Hi Mauro,

It looks like all of these are actual files that were not migrated. Please
send me the rebalance logs for this node so I can check for any migration
errors.

As this is a disperse volume, copying the files to the mount will be
difficult. Ashish, how would we go about this?


Regards,
Nithya
Post by Mauro Tridici
Hi Nithya,
in order to give an answer to your question as soon as possible, I just
considered only the content of one brick of server s06 (in attachment you
can find the content of /gluster/mnt1/brick).
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 106M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 3,0G 97G 3% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 12G 9,0T 1% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 12G 9,0T 1% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 12G 9,0T 1% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 12G 9,0T 1% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 1,4T 7,7T 16% /gluster/mnt9
*/dev/mapper/gluster_vgb-gluster_lvb 9,0T 12G 9,0T 1%
/gluster/mnt1*
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 1,4T 7,7T 16% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 12G 9,0T 1% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 1,4T 7,7T 16% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 1,4T 7,7T 16%
/gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 1,4T 7,7T 16%
/gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 1,4T 7,7T 16%
/gluster/mnt12
The scenario is almost the same for all the bricks removed from server s04, s05 and s06.
In the next hours, I will check every files on each removed bricks.
So, if I understand, I can proceed with deletion of directories and files
left on the bricks only if each file have T tag, right?
Thank you in advance,
Mauro
Il giorno 03 ott 2018, alle ore 16:49, Nithya Balachandran <
Post by Mauro Tridici
Good morning Ashish,
your explanations are always very useful, thank you very much: I will
remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended
successfully and we were able to free up all bricks defined on server s04,
s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete
this first phase (removing of all bricks defined on s04, s05 and s06
servers).
I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed
that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long
list of directories names and some files that are still on the brick, but
respective size is 0.
a lot of empty directories on /gluster/mnt*/brick/.glusterfs
8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-
85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-
a89a-69c728ee8273
some empty files in directories in /gluster/mnt*/brick/*
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 *archive_calypso*
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 *ans002*
drwxr-x--- 3 5104 5100 32 11 set 22.14 *ans004*
drwxr-x--- 3 4506 4500 31 11 set 22.14 *ans006*
drwxr-x--- 3 4515 4500 28 11 set 22.14 *ans015*
drwxr-x--- 4 4321 4300 54 11 set 22.14 *ans021*
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501
.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501
.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501
.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5
What we have to do with this data? Should I backup this “empty” dirs and
files on a different storage before deleting them?
Hi Mauro,
Are you sure these files and directories are empty? Please provide the ls
-l output for the files. If they are 'T' files , they can be ignored.
Regards,
Nithya
Post by Mauro Tridici
As soon as all the bricks will be empty, I plan to re-add the new bricks
*gluster peer detach s04*
*gluster peer detach s05*
*gluster peer detach s06*
*gluster peer probe s04*
*gluster peer probe s05*
*gluster peer probe s06*
*gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick
s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick
s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick
s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick
s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick
s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick
s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick
s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick
s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick
s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick
s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick
s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick
s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick
s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick
s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick
s06-stg:/gluster/mnt12/brick force*
*gluster volume rebalance tier2 fix-layout start*
*gluster volume rebalance tier2 start*
From your point of view, are they the right commands to close this repairing task?
Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if
it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for
all the bricks on a Node.
So we have to do following step to kill any one brick on a node.
*Steps to kill a brick when multiplex is on -*
*Step - 1 *
Find *unix domain_socket* of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06
/usr/local/sbin/glusterfsd -s apandey --volfile-id
vol.apandey.home-apandey-bricks-gluster-vol-1 -p
/var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid
-S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name
/home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home
-apandey-bricks-gluster-vol-1.log --xlator-option
*-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24
--process-name brick --brick-port 49158 --xlator-option
vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
*Step - 2*
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
*gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket
/home/apandey/bricks/gluster/vol-6*
Status of volume: vol
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------
------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y
28311
Self-heal Daemon on localhost N/A N/A Y
29787
Task Status of Volume vol
------------------------------------------------------------
------------------
There are no active volume tasks
/home/apandey/bricks/gluster/vol-6
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------
------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y
28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
29787
Task Status of Volume vol
------------------------------------------------------------
------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
------------------------------
*Sent: *Friday, September 28, 2018 9:25:53 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that
more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------
------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y
3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y
3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y
3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y
3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y
3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y
3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y
3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y
3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y
3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y
3644
Self-heal Daemon on localhost N/A N/A Y
79376
Quota Daemon on localhost N/A N/A Y
79472
Bitrot Daemon on localhost N/A N/A Y
79485
Scrubber Daemon on localhost N/A N/A Y
79505
Self-heal Daemon on s03-stg N/A N/A Y
77073
Quota Daemon on s03-stg N/A N/A Y
77148
Bitrot Daemon on s03-stg N/A N/A Y
77160
Scrubber Daemon on s03-stg N/A N/A Y
77191
Self-heal Daemon on s02-stg N/A N/A Y
80150
Quota Daemon on s02-stg N/A N/A Y
80226
Bitrot Daemon on s02-stg N/A N/A Y
80238
Scrubber Daemon on s02-stg N/A N/A Y
80269
Self-heal Daemon on s04-stg N/A N/A Y
106815
Quota Daemon on s04-stg N/A N/A Y
106866
Bitrot Daemon on s04-stg N/A N/A Y
106878
Scrubber Daemon on s04-stg N/A N/A Y
106897
Self-heal Daemon on s05-stg N/A N/A Y
130807
Quota Daemon on s05-stg N/A N/A Y
130884
Bitrot Daemon on s05-stg N/A N/A Y
130896
Scrubber Daemon on s05-stg N/A N/A Y
130927
Self-heal Daemon on s06-stg N/A N/A Y
157146
Quota Daemon on s06-stg N/A N/A Y
157239
Bitrot Daemon on s06-stg N/A N/A Y
157252
Scrubber Daemon on s06-stg N/A N/A Y
157288
Task Status of Volume tier2
------------------------------------------------------------
------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/*glusterfs*d
-s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p
/var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S
/var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name
/gluster/mnt1/brick -l /var/log/*glusterfs*/bricks/gluster-mnt1-brick.log
--xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee
--brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/*glusterfs*
/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket
--xlator-option *replicate*.node-uuid=b734b083
-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid
-l /var/log/*glusterfs*/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket
--xlator-option *replicate*.data-self-heal=off --xlator-option
*replicate*.metadata-self-heal=off --xlator-option
*replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l
/var/log/*glusterfs*/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket
--global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/*glusterfs*
-s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid
-l /var/log*glusterfs*/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket
--global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto
*glusterfs*
------------------------------
*Sent: *Friday, September 28, 2018 9:08:52 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the
respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a
node but I think the above step also do the same.
---
Ashish
Post by Mauro Tridici
------------------------------
*Sent: *Friday, September 28, 2018 7:08:41 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices
(switch 10GbE, cables, NICs, servers ports, and so on), operating systems
version and settings, network bonding configuration, gluster packages
versions, tuning profiles, etc. but everything seems to be ok. The first 3
servers (and volume) operated without problem for one year. After we added
the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or
could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems
that the rebalance is going ahead without errors, but it is only scanning
the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a
new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure
and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully
removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick.
Then use replace-brick command to replace brick of a volume on s05 with
this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the
brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying
any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on
your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2
even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey <
We could have taken approach -2 even if you did not have free disks. You
should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time
depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of
something which we can not stop.
If free disks are the only concern I will give you some more steps to
deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there
is nothing to heal.
---
Ashish
------------------------------
*Sent: *Friday, September 28, 2018 4:21:03 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you
suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as
indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks
1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after
about 3TB of moved data, rebalance speed slowed down and some transfer
errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to
complete the step, we decided to stop the remove-brick execution and start
it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take
a look to the following output)
s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
s04-stg 0 0Bytes
182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other
suggestion that, in this particular case, could be useful to reduce errors
(I know that they are related to the current volume configuration) and
improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey <
Yes, you can.
If not me others may also reply.
---
Ashish
------------------------------
*Sent: *Thursday, September 27, 2018 4:24:12 PM
*Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout
option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this
value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey <
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each
starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub
volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on
6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should
have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will
have 4 other bricks of that volume
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
...
[Message clipped]
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
https://it.linkedin.com/in/mauro-tridici-5977238b
Mauro Tridici
2018-10-04 05:56:01 UTC
Permalink
Hi Nithya,

you can here the link to download the tier2-rebalance log.

https://drive.google.com/file/d/1CsA-lettfUTluoMH7OZ5V8dZO1_uYLI7/view?usp=sharing

Please note that this log file contains information about the first remove-brick operation already completed ( bricks #1-#6 ) and about the second one ( bricks #7-#12 ) that is still running.
So, I can say that the log file is very verbose.

Since s04 e s05 server bricks have only files like the following one, I would ask you if I can remove them.
I will wait for suggestions about the management of the files left on s06 server.

Thank you,
Mauro
Post by Ashish Pandey
Hi Mauro,
It looks like all of these are actual files that were not migrated. Please send me the rebalance logs for this node so I can check for any migration errors.
As this is a disperse volume, copying the files to the mount will be difficult. Ashish, how would we go about this?
Regards,
Nithya
Hi Nithya,
in order to give an answer to your question as soon as possible, I just considered only the content of one brick of server s06 (in attachment you can find the content of /gluster/mnt1/brick).
File system Dim. Usati Dispon. Uso% Montato su
/dev/mapper/cl_s06-root 100G 2,1G 98G 3% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4,0K 32G 1% /dev/shm
tmpfs 32G 106M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl_s06-var 100G 3,0G 97G 3% /var
/dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster
/dev/sda1 1014M 152M 863M 15% /boot
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 12G 9,0T 1% /gluster/mnt3
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 12G 9,0T 1% /gluster/mnt6
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 12G 9,0T 1% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 12G 9,0T 1% /gluster/mnt4
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 1,4T 7,7T 16% /gluster/mnt9
/dev/mapper/gluster_vgb-gluster_lvb 9,0T 12G 9,0T 1% /gluster/mnt1
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 1,4T 7,7T 16% /gluster/mnt7
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 12G 9,0T 1% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 1,4T 7,7T 16% /gluster/mnt8
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 1,4T 7,7T 16% /gluster/mnt11
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 1,4T 7,7T 16% /gluster/mnt10
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 1,4T 7,7T 16% /gluster/mnt12
The scenario is almost the same for all the bricks removed from server s04, s05 and s06.
In the next hours, I will check every files on each removed bricks.
So, if I understand, I can proceed with deletion of directories and files left on the bricks only if each file have T tag, right?
Thank you in advance,
Mauro
Post by Mauro Tridici
Good morning Ashish,
your explanations are always very useful, thank you very much: I will remember these suggestions for any future needs.
Anyway, during the week-end, the remove-brick procedures ended successfully and we were able to free up all bricks defined on server s04, s05 and 6 bricks of 12 on server s06.
So, we can say that, thanks to your suggestions, we are about to complete this first phase (removing of all bricks defined on s04, s05 and s06 servers).
I really appreciated your support.
Now I have a last question (I hope): after remove-brick commit I noticed that some data remain on each brick (about 1.2GB of data).
Please, take a look to the “df-h_on_s04_s05_s06.txt”.
The situation is almost the same on all 3 servers mentioned above: a long list of directories names and some files that are still on the brick, but respective size is 0.
a lot of empty directories on /gluster/mnt*/brick/.glusterfs
8 /gluster/mnt2/brick/.glusterfs/b7/1b
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c-4c02-85a5-085992840c83
0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48-43a7-a89a-69c728ee8273
some empty files in directories in /gluster/mnt*/brick/*
totale 32
drwxr-xr-x 7 root root 100 11 set 22.14 archive_calypso
totale 0
drwxr-x--- 3 root 5200 29 11 set 22.13 ans002
drwxr-x--- 3 5104 5100 32 11 set 22.14 ans004
drwxr-x--- 3 4506 4500 31 11 set 22.14 ans006
drwxr-x--- 3 4515 4500 28 11 set 22.14 ans015
drwxr-x--- 4 4321 4300 54 11 set 22.14 ans021
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5/echam_sf006_198110.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198105.01.gz
0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5/echam_sf006_198109.01.gz
8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5
What we have to do with this data? Should I backup this “empty” dirs and files on a different storage before deleting them?
Hi Mauro,
Are you sure these files and directories are empty? Please provide the ls -l output for the files. If they are 'T' files , they can be ignored.
Regards,
Nithya
gluster peer detach s04
gluster peer detach s05
gluster peer detach s06
gluster peer probe s04
gluster peer probe s05
gluster peer probe s06
gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
gluster volume rebalance tier2 fix-layout start
gluster volume rebalance tier2 start
From your point of view, are they the right commands to close this repairing task?
Thank you very much for your help.
Regards,
Mauro
Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if it is by default ON for 3.12.14 or not.
See "cluster.brick-multiplex: on" in gluster v <volname> info
If brick multiplexing is ON, you will see only one process running for all the bricks on a Node.
So we have to do following step to kill any one brick on a node.
Steps to kill a brick when multiplex is on -
Step - 1
Find unix domain_socket of the process on a node.
This is on my machine when I have all the bricks on same machine
root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/home-apandey-bricks-gluster-vol-1.log --xlator-option *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name brick --brick-port 49158 --xlator-option vol-server.listen-port=49158
Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket
Step - 2
Run following command to kill a brick on the same node -
gf_attach -d <unix domain_socket> brick_path_on_that_node
gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket /home/apandey/bricks/gluster/vol-6
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 49158 0 Y 28311
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
OK
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-4 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-5 49158 0 Y 28311
Brick apandey:/home/apandey/bricks/gluster/
vol-6 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 29787
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
To start a brick we just need to start volume using "force"
gluster v start <volname> force
----
Ashish
Sent: Friday, September 28, 2018 9:25:53 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
I asked you how to detect the PID of a specific brick because I see that more than one brick has the same PID (also on my virtual env).
If I kill one of them I risk to kill some other brick. Is it normal?
Status of volume: tier2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s01-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt1/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt1/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt2/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt2/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt3/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt3/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt4/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt4/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt5/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt5/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt6/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt6/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt7/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt7/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt8/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt8/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt9/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt9/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt10/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt10/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt11/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt11/brick 49153 0 Y 3953
Brick s01-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s02-stg:/gluster/mnt12/brick 49153 0 Y 3956
Brick s03-stg:/gluster/mnt12/brick 49153 0 Y 3953
Brick s04-stg:/gluster/mnt1/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt2/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt3/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt4/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt5/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt6/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt7/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt8/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt9/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt10/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt11/brick 49153 0 Y 3433
Brick s04-stg:/gluster/mnt12/brick 49153 0 Y 3433
Brick s05-stg:/gluster/mnt1/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt2/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt3/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt4/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt5/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt6/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt7/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt8/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt9/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt10/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt11/brick 49153 0 Y 3709
Brick s05-stg:/gluster/mnt12/brick 49153 0 Y 3709
Brick s06-stg:/gluster/mnt1/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt2/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt3/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt4/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt5/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt6/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt7/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt8/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt9/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt10/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt11/brick 49153 0 Y 3644
Brick s06-stg:/gluster/mnt12/brick 49153 0 Y 3644
Self-heal Daemon on localhost N/A N/A Y 79376
Quota Daemon on localhost N/A N/A Y 79472
Bitrot Daemon on localhost N/A N/A Y 79485
Scrubber Daemon on localhost N/A N/A Y 79505
Self-heal Daemon on s03-stg N/A N/A Y 77073
Quota Daemon on s03-stg N/A N/A Y 77148
Bitrot Daemon on s03-stg N/A N/A Y 77160
Scrubber Daemon on s03-stg N/A N/A Y 77191
Self-heal Daemon on s02-stg N/A N/A Y 80150
Quota Daemon on s02-stg N/A N/A Y 80226
Bitrot Daemon on s02-stg N/A N/A Y 80238
Scrubber Daemon on s02-stg N/A N/A Y 80269
Self-heal Daemon on s04-stg N/A N/A Y 106815
Quota Daemon on s04-stg N/A N/A Y 106866
Bitrot Daemon on s04-stg N/A N/A Y 106878
Scrubber Daemon on s04-stg N/A N/A Y 106897
Self-heal Daemon on s05-stg N/A N/A Y 130807
Quota Daemon on s05-stg N/A N/A Y 130884
Bitrot Daemon on s05-stg N/A N/A Y 130896
Scrubber Daemon on s05-stg N/A N/A Y 130927
Self-heal Daemon on s06-stg N/A N/A Y 157146
Quota Daemon on s06-stg N/A N/A Y 157239
Bitrot Daemon on s06-stg N/A N/A Y 157252
Scrubber Daemon on s06-stg N/A N/A Y 157288
Task Status of Volume tier2
------------------------------------------------------------------------------
Task : Remove brick
ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f
s04-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick
s04-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick
s04-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick
Status : in progress
root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/glusterfsd -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name /gluster/mnt1/brick -l /var/log/glusterfs/bricks/gluster-mnt1-brick.log --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee --brick-port 49153 --xlator-option tier2-server.listen-port=49153
root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee
root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel
root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/logglusterfs/scrub.log -S /var/run/gluster/ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel
root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto glusterfs
Sent: Friday, September 28, 2018 9:08:52 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Thank you, Ashish.
I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?
Many Thanks,
Mauro
gluster v status <volname> will give you the list of bricks and the respective process id.
Also, you can use "ps aux | grep glusterfs" to see all the processes on a node but I think the above step also do the same.
---
Ashish
Sent: Friday, September 28, 2018 7:08:41 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
please excuse me, I'm very sorry for misunderstanding.
Before contacting you during last days, we checked all network devices (switch 10GbE, cables, NICs, servers ports, and so on), operating systems version and settings, network bonding configuration, gluster packages versions, tuning profiles, etc. but everything seems to be ok. The first 3 servers (and volume) operated without problem for one year. After we added the new 3 servers we noticed something wrong.
Fortunately, yesterday you gave me an hand to understand where is (or could be) the problem.
At this moment, after we re-launched the remove-brick command, it seems that the rebalance is going ahead without errors, but it is only scanning the files.
May be that during the future data movement some errors could appear.
For this reason, it could be useful to know how to proceed in case of a new failure: insist with approach n.1 or change the strategy?
We are thinking to try to complete the running remove-brick procedure and make a decision based on the outcome.
Question: could we start approach n.2 also after having successfully removed the V1 subvolume?!
Yes, we can do that. My idea is to use replace-brick command.
We will kill "ONLY" one brick process on s06. We will format this brick. Then use replace-brick command to replace brick of a volume on s05 with this formatted brick.
heal will be triggered and data of the respective volume will be placed on this brick.
Now, we can format the brick which got freed up on s05 and replace the brick which we killed on s06 to s05.
During this process, we have to make sure heal completed before trying any other replace/kill brick.
It is tricky but looks doable. Think about it and try to perform it on your virtual environment first before trying on production.
-------
If it is still possible, could you please illustrate the approach n.2 even if I dont have free disks?
I would like to start thinking about it and test it on a virtual environment.
Thank you in advance for your help and patience.
Regards,
Mauro
We could have taken approach -2 even if you did not have free disks. You should have told me why are you
opting Approach-1 or perhaps I should have asked.
I was wondering for approach 1 because sometimes re-balance takes time depending upon the data size.
Anyway, I hope whole setup is stable, I mean it is not in the middle of something which we can not stop.
If free disks are the only concern I will give you some more steps to deal with it and follow the approach 2.
Let me know once you think everything is fine with the system and there is nothing to heal.
---
Ashish
Sent: Friday, September 28, 2018 4:21:03 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Hi Ashish,
as I said in my previous message, we adopted the first approach you suggested (setting network.ping-timeout option to 0).
This choice was due to the absence of empty brick to be used as indicated in the second approach.
So, we launched remove-brick command on the first subvolume (V1, bricks 1,2,3,4,5,6 on server s04).
Rebalance started moving the data across the other bricks, but, after about 3TB of moved data, rebalance speed slowed down and some transfer errors appeared in the rebalance.log of server s04.
At this point, since remaining 1,8TB need to be moved in order to complete the step, we decided to stop the remove-brick execution and start it again (I hope it doesn’t stop again before complete the rebalance)
Now rebalance is not moving data, it’s only scanning files (please, take a look to the following output)
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
s04-stg 0 0Bytes 182008 0 0 in progress 3:08:09
Estimated time left for rebalance to complete : 442:45:06
If I’m not wrong, remove-brick rebalances entire cluster each time it start.
Is there a way to speed up this procedure? Do you have some other suggestion that, in this particular case, could be useful to reduce errors (I know that they are related to the current volume configuration) and improve rebalance performance avoiding to rebalance the entire cluster?
Thank you in advance,
Mauro
Yes, you can.
If not me others may also reply.
---
Ashish
Sent: Thursday, September 27, 2018 4:24:12 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Dear Ashish,
I can not thank you enough!
Your procedure and description is very detailed.
I think to follow the first approach after setting network.ping-timeout option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this value reduced rebalance errors).
After the fix I will set network.ping-timeout option to default value.
Could I contact you again if I need some kind of suggestion?
Thank you very much again.
Have a good day,
Mauro
Hi Mauro,
We can divide the 36 newly added bricks into 6 set of 6 bricks each starting from brick37.
That means, there are 6 ec subvolumes and we have to deal with one sub volume at a time.
I have named it V1 to V6.
Take the case of V1.
The best configuration/setup would be to have all the 6 bricks of V1 on 6 different nodes.
However, in your case you have added 3 new nodes. So, at least we should have 2 bricks on 3 different newly added nodes.
This way, in 4+2 EC configuration, even if one node goes down you will have 4 other bricks of that volume
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
...
[Message clipped]
Nithya Balachandran
2018-10-04 13:22:39 UTC
Permalink
Hi Mauro,


The files on s04 and s05 can be deleted safely as long as those bricks have
been removed from the volume and their brick processes are not running.


.glusterfs/indices/xattrop/xattrop-* are links to files that need to be healed.
.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008 links
to files that bitrot (if enabled)says are corrupted. (none in this
case)



I will get back to you on s06. Can you please provide the output of
gluster volume info again?


Regards,
Nithya
Dear Ashish, Dear Nithya,
I’m writing this message only to summarize and simplify the information
about the "not migrated” files left on removed bricks on server s04, s05
and s06.
In attachment, you can find 3 files (1 file for each server) containing
the “not migrated” files lists and related brick number.
- s04 and s05 bricks contain only not migrated files in hidden
directories “/gluster/mnt#/brick/.glusterfs" (I could delete them,
doesn’t it?)
- s06 bricks contain
- not migrated files in hidden directories “/gluster/mnt#/
brick/.glusterfs”;
- not migrated files with size equal to 0;
- not migrated files with size greater than 0.
I think it was necessary to collect and summarize information to simplify
your analysis.
Thank you very much,
Mauro
Mauro Tridici
2018-10-04 13:31:36 UTC
Permalink
Hi Nithya,

thank you very much.
This is the current “gluster volume info” output after removing bricks (and after peer detach command).

[***@s01 ~]# gluster volume info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (4 + 2) = 36
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 0
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io-cache: off
disperse.cpu-extensions: auto
performance.io-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: server
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%

Regards,
Mauro
Post by Ashish Pandey
Hi Mauro,
The files on s04 and s05 can be deleted safely as long as those bricks have been removed from the volume and their brick processes are not running.
.glusterfs/indices/xattrop/xattrop-* are links to files that need to be healed.
.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008 links to files that bitrot (if enabled)says are corrupted. (none in this case)
I will get back to you on s06. Can you please provide the output of gluster volume info again?
Regards,
Nithya
Dear Ashish, Dear Nithya,
I’m writing this message only to summarize and simplify the information about the "not migrated” files left on removed bricks on server s04, s05 and s06.
In attachment, you can find 3 files (1 file for each server) containing the “not migrated” files lists and related brick number.
s04 and s05 bricks contain only not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs" (I could delete them, doesn’t it?)
s06 bricks contain
not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs”;
not migrated files with size equal to 0;
not migrated files with size greater than 0.
I think it was necessary to collect and summarize information to simplify your analysis.
Thank you very much,
Mauro
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it <mailto:***@cmcc.it>
https://it.linkedin.com/in/mauro-tridici-5977238b
Mauro Tridici
2018-10-06 00:01:04 UTC
Permalink
Hi All,

since we need to restore gluster storage as soon as possible, we decided to ignore the few files that could be lost and to go ahead.
So we cleaned all bricks content of servers s04, s05 and s06.

As planned some days ago, we executed the following commands:

gluster peer detach s04
gluster peer detach s05
gluster peer detach s06

gluster peer probe s04
gluster peer probe s05
gluster peer probe s06

gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force

gluster volume rebalance tier2 fix-layout start

Everything seem to be fine and fix-layout ended.

[***@s01 ~]# gluster volume rebalance tier2 status
Node status run time in h:m:s
--------- ----------- ------------
localhost fix-layout completed 12:11:6
s02-stg fix-layout completed 12:11:18
s03-stg fix-layout completed 12:11:12
s04-stg fix-layout completed 12:11:20
s05-stg fix-layout completed 12:11:14
s06-stg fix-layout completed 12:10:47
volume rebalance: tier2: success

[***@s01 ~]# gluster volume info

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s05-stg:/gluster/mnt1/brick
Brick39: s06-stg:/gluster/mnt1/brick
Brick40: s04-stg:/gluster/mnt2/brick
Brick41: s05-stg:/gluster/mnt2/brick
Brick42: s06-stg:/gluster/mnt2/brick
Brick43: s04-stg:/gluster/mnt3/brick
Brick44: s05-stg:/gluster/mnt3/brick
Brick45: s06-stg:/gluster/mnt3/brick
Brick46: s04-stg:/gluster/mnt4/brick
Brick47: s05-stg:/gluster/mnt4/brick
Brick48: s06-stg:/gluster/mnt4/brick
Brick49: s04-stg:/gluster/mnt5/brick
Brick50: s05-stg:/gluster/mnt5/brick
Brick51: s06-stg:/gluster/mnt5/brick
Brick52: s04-stg:/gluster/mnt6/brick
Brick53: s05-stg:/gluster/mnt6/brick
Brick54: s06-stg:/gluster/mnt6/brick
Brick55: s04-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt7/brick
Brick57: s06-stg:/gluster/mnt7/brick
Brick58: s04-stg:/gluster/mnt8/brick
Brick59: s05-stg:/gluster/mnt8/brick
Brick60: s06-stg:/gluster/mnt8/brick
Brick61: s04-stg:/gluster/mnt9/brick
Brick62: s05-stg:/gluster/mnt9/brick
Brick63: s06-stg:/gluster/mnt9/brick
Brick64: s04-stg:/gluster/mnt10/brick
Brick65: s05-stg:/gluster/mnt10/brick
Brick66: s06-stg:/gluster/mnt10/brick
Brick67: s04-stg:/gluster/mnt11/brick
Brick68: s05-stg:/gluster/mnt11/brick
Brick69: s06-stg:/gluster/mnt11/brick
Brick70: s04-stg:/gluster/mnt12/brick
Brick71: s05-stg:/gluster/mnt12/brick
Brick72: s06-stg:/gluster/mnt12/brick
Options Reconfigured:
network.ping-timeout: 42
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: none
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io-cache: off
disperse.cpu-extensions: auto
performance.io-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: none
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: off
cluster.server-quorum-ratio: 51%

The last step should be the data rebalance between the servers, but rebalance failed soon with a lot of errors like the following ones:

[2018-10-05 23:48:38.644978] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-tier2-client-70: Server lk version = 1
[2018-10-05 23:48:44.735323] I [dht-rebalance.c:4512:gf_defrag_start_crawl] 0-tier2-dht: gf_defrag_start_crawl using commit hash 3720331860
[2018-10-05 23:48:44.736205] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736266] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736282] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736377] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736436] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736459] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736460] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736537] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736571] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736574] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736827] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736887] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736904] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.740337] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.740381] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.740394] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-05 23:48:50.066103] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr] 0-tier2-dht: fixing the layout of /

In attachment you can find the first logs captured during the rebalance execution.
In your opinion, is there a way to restore the gluster storage or all the data have been lost?

Thank you in advance,
Mauro
Post by Mauro Tridici
Hi Nithya,
thank you very much.
This is the current “gluster volume info” output after removing bricks (and after peer detach command).
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (4 + 2) = 36
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
network.ping-timeout: 0
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io <http://performance.io/>-cache: off
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: server
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
Regards,
Mauro
Post by Ashish Pandey
Hi Mauro,
The files on s04 and s05 can be deleted safely as long as those bricks have been removed from the volume and their brick processes are not running.
.glusterfs/indices/xattrop/xattrop-* are links to files that need to be healed.
.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008 links to files that bitrot (if enabled)says are corrupted. (none in this case)
I will get back to you on s06. Can you please provide the output of gluster volume info again?
Regards,
Nithya
Dear Ashish, Dear Nithya,
I’m writing this message only to summarize and simplify the information about the "not migrated” files left on removed bricks on server s04, s05 and s06.
In attachment, you can find 3 files (1 file for each server) containing the “not migrated” files lists and related brick number.
s04 and s05 bricks contain only not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs" (I could delete them, doesn’t it?)
s06 bricks contain
not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs”;
not migrated files with size equal to 0;
not migrated files with size greater than 0.
I think it was necessary to collect and summarize information to simplify your analysis.
Thank you very much,
Mauro
Mauro Tridici
2018-10-08 07:52:30 UTC
Permalink
Hi All,

for your information, this is the current rebalance status:

[***@s01 ~]# gluster volume rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 551922 20.3TB 2349397 0 61849 in progress 55:25:38
s02-stg 287631 13.2TB 959954 0 30262 in progress 55:25:39
s03-stg 288523 12.7TB 973111 0 30220 in progress 55:25:39
s04-stg 0 0Bytes 0 0 0 failed 0:00:37
s05-stg 0 0Bytes 0 0 0 completed 48:33:03
s06-stg 0 0Bytes 0 0 0 completed 48:33:02
Estimated time left for rebalance to complete : 1023:49:56
volume rebalance: tier2: success

Rebalance is migrating files on s05, s06 servers and on s04 too (although it is marked as failed).
s05 and s06 tasks are completed.

Questions:

1) it seems that rebalance is moving files, but it is fixing the layout also, is it normal?
2) when rebalance will be completed, what we need to do before return the gluster storage to the users? We have to launch rebalance again in order to involve s04 server too or a fix-layout to eventually fix some error on s04?

Thank you very much,
Mauro
Post by Mauro Tridici
Hi All,
some important updates about the issue mentioned below.
- stop gluster volume
- reboot the servers
- start gluster volume
- change some gluster volume options
- start the rebalance again
gluster volume set tier2 network.ping-timeout 02
gluster volume set all cluster.brick-multiplex on
gluster volume set tier2 cluster.server-quorum-ratio 51%
gluster volume set tier2 cluster.server-quorum-type server
gluster volume set tier2 cluster.quorum-type auto
gluster volume set tier2 network.ping-timeout 42
gluster volume set all cluster.brick-multiplex off
gluster volume set tier2 cluster.server-quorum-ratio none
gluster volume set tier2 cluster.server-quorum-type none
gluster volume set tier2 cluster.quorum-type none
The result was that rebalance starts moving data from s01, s02 ed s03 servers to s05 and s06 servers (the new added ones), but it failed on s04 server after 37 seconds.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 286680 12.6TB 1217960 0 43343 in progress 32:10:24
s02-stg 126291 12.4TB 413077 0 21932 in progress 32:10:25
s03-stg 126516 11.9TB 433014 0 21870 in progress 32:10:25
s04-stg 0 0Bytes 0 0 0 failed 0:00:37
s05-stg 0 0Bytes 0 0 0 in progress 32:10:25
s06-stg 0 0Bytes 0 0 0 in progress 32:10:25
Estimated time left for rebalance to complete : 624:47:48
volume rebalance: tier2: success
When rebalance will be completed, we are planning to re-launch it to try to involve s04 server also.
Do you have some idea about what happened in my previous message and why, now, rebalance it’s running although it’s not involve s04 server?
In attachment the complete tier2-rebalance.log file related to s04 server.
Thank you very much for your help,
Mauro
<tier2-rebalance.log.gz>
Post by Mauro Tridici
Hi All,
since we need to restore gluster storage as soon as possible, we decided to ignore the few files that could be lost and to go ahead.
So we cleaned all bricks content of servers s04, s05 and s06.
gluster peer detach s04
gluster peer detach s05
gluster peer detach s06
gluster peer probe s04
gluster peer probe s05
gluster peer probe s06
gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
gluster volume rebalance tier2 fix-layout start
Everything seem to be fine and fix-layout ended.
Node status run time in h:m:s
--------- ----------- ------------
localhost fix-layout completed 12:11:6
s02-stg fix-layout completed 12:11:18
s03-stg fix-layout completed 12:11:12
s04-stg fix-layout completed 12:11:20
s05-stg fix-layout completed 12:11:14
s06-stg fix-layout completed 12:10:47
volume rebalance: tier2: success
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s05-stg:/gluster/mnt1/brick
Brick39: s06-stg:/gluster/mnt1/brick
Brick40: s04-stg:/gluster/mnt2/brick
Brick41: s05-stg:/gluster/mnt2/brick
Brick42: s06-stg:/gluster/mnt2/brick
Brick43: s04-stg:/gluster/mnt3/brick
Brick44: s05-stg:/gluster/mnt3/brick
Brick45: s06-stg:/gluster/mnt3/brick
Brick46: s04-stg:/gluster/mnt4/brick
Brick47: s05-stg:/gluster/mnt4/brick
Brick48: s06-stg:/gluster/mnt4/brick
Brick49: s04-stg:/gluster/mnt5/brick
Brick50: s05-stg:/gluster/mnt5/brick
Brick51: s06-stg:/gluster/mnt5/brick
Brick52: s04-stg:/gluster/mnt6/brick
Brick53: s05-stg:/gluster/mnt6/brick
Brick54: s06-stg:/gluster/mnt6/brick
Brick55: s04-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt7/brick
Brick57: s06-stg:/gluster/mnt7/brick
Brick58: s04-stg:/gluster/mnt8/brick
Brick59: s05-stg:/gluster/mnt8/brick
Brick60: s06-stg:/gluster/mnt8/brick
Brick61: s04-stg:/gluster/mnt9/brick
Brick62: s05-stg:/gluster/mnt9/brick
Brick63: s06-stg:/gluster/mnt9/brick
Brick64: s04-stg:/gluster/mnt10/brick
Brick65: s05-stg:/gluster/mnt10/brick
Brick66: s06-stg:/gluster/mnt10/brick
Brick67: s04-stg:/gluster/mnt11/brick
Brick68: s05-stg:/gluster/mnt11/brick
Brick69: s06-stg:/gluster/mnt11/brick
Brick70: s04-stg:/gluster/mnt12/brick
Brick71: s05-stg:/gluster/mnt12/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 42
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: none
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io <http://performance.io/>-cache: off
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: none
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: off
cluster.server-quorum-ratio: 51%
[2018-10-05 23:48:38.644978] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-tier2-client-70: Server lk version = 1
[2018-10-05 23:48:44.735323] I [dht-rebalance.c:4512:gf_defrag_start_crawl] 0-tier2-dht: gf_defrag_start_crawl using commit hash 3720331860
[2018-10-05 23:48:44.736205] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736266] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736282] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736377] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736436] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736459] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736460] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736537] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736571] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736574] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736827] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736887] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736904] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.740337] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.740381] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.740394] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-05 23:48:50.066103] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr] 0-tier2-dht: fixing the layout of /
In attachment you can find the first logs captured during the rebalance execution.
In your opinion, is there a way to restore the gluster storage or all the data have been lost?
Thank you in advance,
Mauro
<rebalance_log.txt>
Post by Mauro Tridici
Hi Nithya,
thank you very much.
This is the current “gluster volume info” output after removing bricks (and after peer detach command).
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (4 + 2) = 36
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
network.ping-timeout: 0
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io <http://performance.io/>-cache: off
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: server
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
Regards,
Mauro
Post by Ashish Pandey
Hi Mauro,
The files on s04 and s05 can be deleted safely as long as those bricks have been removed from the volume and their brick processes are not running.
.glusterfs/indices/xattrop/xattrop-* are links to files that need to be healed.
.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008 links to files that bitrot (if enabled)says are corrupted. (none in this case)
I will get back to you on s06. Can you please provide the output of gluster volume info again?
Regards,
Nithya
Dear Ashish, Dear Nithya,
I’m writing this message only to summarize and simplify the information about the "not migrated” files left on removed bricks on server s04, s05 and s06.
In attachment, you can find 3 files (1 file for each server) containing the “not migrated” files lists and related brick number.
s04 and s05 bricks contain only not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs" (I could delete them, doesn’t it?)
s06 bricks contain
not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs”;
not migrated files with size equal to 0;
not migrated files with size greater than 0.
I think it was necessary to collect and summarize information to simplify your analysis.
Thank you very much,
Mauro
Nithya Balachandran
2018-10-08 08:43:07 UTC
Permalink
Hi Mauro,

Yes, a rebalance consists of 2 operations for every directory:

1. Fix the layout for the new volume config (newly added or removed
bricks)
2. Migrate files to their new hashed subvols based on the new layout


Are you running a rebalance because you added new bricks to the volume ? As
per an earlier email you have already run a fix-layout.

On s04, please check the rebalance log file to see why the rebalance failed.

Regards,
Nithya
Post by Mauro Tridici
Hi All,
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 551922 20.3TB
2349397 0 61849 in progress 55:25:38
s02-stg 287631 13.2TB
959954 0 30262 in progress 55:25:39
s03-stg 288523 12.7TB
973111 0 30220 in progress 55:25:39
s04-stg 0 0Bytes
0 0 0 failed 0:00:37
s05-stg 0 0Bytes
0 0 0 completed 48:33:03
s06-stg 0 0Bytes
0 0 0 completed 48:33:02
Estimated time left for rebalance to complete : 1023:49:56
volume rebalance: tier2: success
Rebalance is migrating files on s05, s06 servers and on s04 too (although
it is marked as failed).
s05 and s06 tasks are completed.
1) it seems that rebalance is moving files, but it is fixing the layout also, is it normal?
2) when rebalance will be completed, what we need to do before return the
gluster storage to the users? We have to launch rebalance again in order to
involve s04 server too or a fix-layout to eventually fix some error on s04?
Thank you very much,
Mauro
Il giorno 07 ott 2018, alle ore 10:29, Mauro Tridici <
Hi All,
some important updates about the issue mentioned below.
- stop gluster volume
- reboot the servers
- start gluster volume
- change some gluster volume options
- start the rebalance again
The options that I changed are listed below after reading some threads on
gluster volume set tier2 network.ping-timeout 02
gluster volume set all cluster.brick-multiplex on
gluster volume set tier2 cluster.server-quorum-ratio 51%
gluster volume set tier2 cluster.server-quorum-type server
gluster volume set tier2 cluster.quorum-type auto
gluster volume set tier2 network.ping-timeout 42
gluster volume set all cluster.brick-multiplex off
gluster volume set tier2 cluster.server-quorum-ratio none
gluster volume set tier2 cluster.server-quorum-type none
gluster volume set tier2 cluster.quorum-type none
The result was that rebalance starts moving data from s01, s02 ed s03
servers to s05 and s06 servers (the new added ones), but it failed on s04
server after 37 seconds.
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 286680 12.6TB
1217960 0 43343 in progress 32:10:24
s02-stg 126291 12.4TB
413077 0 21932 in progress 32:10:25
s03-stg 126516 11.9TB
433014 0 21870 in progress 32:10:25
s04-stg 0 0Bytes
0 0 0 failed 0:00:37
s05-stg 0 0Bytes
0 0 0 in progress 32:10:25
s06-stg 0 0Bytes
0 0 0 in progress 32:10:25
Estimated time left for rebalance to complete : 624:47:48
volume rebalance: tier2: success
When rebalance will be completed, we are planning to re-launch it to try
to involve s04 server also.
Do you have some idea about what happened in my previous message and why,
now, rebalance it’s running although it’s not involve s04 server?
In attachment the complete tier2-rebalance.log file related to s04 server.
Thank you very much for your help,
Mauro
<tier2-rebalance.log.gz>
Il giorno 06 ott 2018, alle ore 02:01, Mauro Tridici <
Hi All,
since we need to restore gluster storage as soon as possible, we decided
to ignore the few files that could be lost and to go ahead.
So we cleaned all bricks content of servers s04, s05 and s06.
*gluster peer detach s04*
*gluster peer detach s05*
*gluster peer detach s06*
*gluster peer probe s04*
*gluster peer probe s05*
*gluster peer probe s06*
*gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick
s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick
s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick
s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick
s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick
s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick
s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick
s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick
s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick
s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick
s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick
s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick
s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick
s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick
s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick
s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick
s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick
s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick
s06-stg:/gluster/mnt12/brick force*
*gluster volume rebalance tier2 fix-layout start*
Everything seem to be fine and fix-layout ended.
Node
status run time in h:m:s
---------
----------- ------------
localhost
fix-layout completed 12:11:6
s02-stg
fix-layout completed 12:11:18
s03-stg
fix-layout completed 12:11:12
s04-stg
fix-layout completed 12:11:20
s05-stg
fix-layout completed 12:11:14
s06-stg
fix-layout completed 12:10:47
volume rebalance: tier2: success
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s05-stg:/gluster/mnt1/brick
Brick39: s06-stg:/gluster/mnt1/brick
Brick40: s04-stg:/gluster/mnt2/brick
Brick41: s05-stg:/gluster/mnt2/brick
Brick42: s06-stg:/gluster/mnt2/brick
Brick43: s04-stg:/gluster/mnt3/brick
Brick44: s05-stg:/gluster/mnt3/brick
Brick45: s06-stg:/gluster/mnt3/brick
Brick46: s04-stg:/gluster/mnt4/brick
Brick47: s05-stg:/gluster/mnt4/brick
Brick48: s06-stg:/gluster/mnt4/brick
Brick49: s04-stg:/gluster/mnt5/brick
Brick50: s05-stg:/gluster/mnt5/brick
Brick51: s06-stg:/gluster/mnt5/brick
Brick52: s04-stg:/gluster/mnt6/brick
Brick53: s05-stg:/gluster/mnt6/brick
Brick54: s06-stg:/gluster/mnt6/brick
Brick55: s04-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt7/brick
Brick57: s06-stg:/gluster/mnt7/brick
Brick58: s04-stg:/gluster/mnt8/brick
Brick59: s05-stg:/gluster/mnt8/brick
Brick60: s06-stg:/gluster/mnt8/brick
Brick61: s04-stg:/gluster/mnt9/brick
Brick62: s05-stg:/gluster/mnt9/brick
Brick63: s06-stg:/gluster/mnt9/brick
Brick64: s04-stg:/gluster/mnt10/brick
Brick65: s05-stg:/gluster/mnt10/brick
Brick66: s06-stg:/gluster/mnt10/brick
Brick67: s04-stg:/gluster/mnt11/brick
Brick68: s05-stg:/gluster/mnt11/brick
Brick69: s06-stg:/gluster/mnt11/brick
Brick70: s04-stg:/gluster/mnt12/brick
Brick71: s05-stg:/gluster/mnt12/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 42
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: none
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io-cache: off
disperse.cpu-extensions: auto
performance.io-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: none
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: off
cluster.server-quorum-ratio: 51%
The last step should be the data rebalance between the servers, but
[2018-10-05 23:48:38.644978] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk]
0-tier2-client-70: Server lk version = 1
[2018-10-05 23:48:44.735323] I [dht-rebalance.c:4512:gf_defrag_start_crawl]
0-tier2-dht: gf_defrag_start_crawl using commit hash 3720331860
[2018-10-05 23:48:44.736205] W [MSGID: 122040]
[ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to
get size and version [Input/output error]
[2018-10-05 23:48:44.736266] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
0-tier2-disperse-7: Insufficient available children for this request (have
0, need 4)
[2018-10-05 23:48:44.736282] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736377] W [MSGID: 122040]
[ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to
get size and version [Input/output error]
[2018-10-05 23:48:44.736436] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
0-tier2-disperse-8: Insufficient available children for this request (have
0, need 4)
[2018-10-05 23:48:44.736459] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736460] W [MSGID: 122040]
[ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to
get size and version [Input/output error]
[2018-10-05 23:48:44.736537] W [MSGID: 122040]
[ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to
get size and version [Input/output error]
[2018-10-05 23:48:44.736571] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
0-tier2-disperse-10: Insufficient available children for this request (have
0, need 4)
[2018-10-05 23:48:44.736574] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
0-tier2-disperse-9: Insufficient available children for this request (have
0, need 4)
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736827] W [MSGID: 122040]
[ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to
get size and version [Input/output error]
[2018-10-05 23:48:44.736887] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
0-tier2-disperse-11: Insufficient available children for this request (have
0, need 4)
[2018-10-05 23:48:44.736904] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.740337] W [MSGID: 122040]
[ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to
get size and version [Input/output error]
[2018-10-05 23:48:44.740381] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
0-tier2-disperse-6: Insufficient available children for this request (have
0, need 4)
[2018-10-05 23:48:44.740394] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-05 23:48:50.066103] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr]
0-tier2-dht: fixing the layout of /
In attachment you can find the first logs captured during the rebalance execution.
In your opinion, is there a way to restore the gluster storage or all the
data have been lost?
Thank you in advance,
Mauro
<rebalance_log.txt>
Il giorno 04 ott 2018, alle ore 15:31, Mauro Tridici <
Hi Nithya,
thank you very much.
This is the current “gluster volume info” output after removing bricks
(and after peer detach command).
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (4 + 2) = 36
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
network.ping-timeout: 0
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io-cache: off
disperse.cpu-extensions: auto
performance.io-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: server
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
Regards,
Mauro
Il giorno 04 ott 2018, alle ore 15:22, Nithya Balachandran <
Hi Mauro,
The files on s04 and s05 can be deleted safely as long as those bricks
have been removed from the volume and their brick processes are not running.
.glusterfs/indices/xattrop/xattrop-* are links to files that need to be healed.
.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008 links to files that bitrot (if enabled)says are corrupted. (none in this case)
I will get back to you on s06. Can you please provide the output of gluster volume info again?
Regards,
Nithya
Dear Ashish, Dear Nithya,
I’m writing this message only to summarize and simplify the information
about the "not migrated” files left on removed bricks on server s04, s05
and s06.
In attachment, you can find 3 files (1 file for each server) containing
the “not migrated” files lists and related brick number.
- s04 and s05 bricks contain only not migrated files in hidden
directories “/gluster/mnt#/brick/.glusterfs" (I could delete them,
doesn’t it?)
- s06 bricks contain
- not migrated files in hidden directories “/gluster/mnt#/bri
ck/.glusterfs”;
- not migrated files with size equal to 0;
- not migrated files with size greater than 0.
I think it was necessary to collect and summarize information to simplify your analysis.
Thank you very much,
Mauro
Mauro Tridici
2018-10-08 08:57:35 UTC
Permalink
Hi Nithya,

thank you, my answers are in lines.
Post by Ashish Pandey
Hi Mauro,
Fix the layout for the new volume config (newly added or removed bricks)
Migrate files to their new hashed subvols based on the new layout
Are you running a rebalance because you added new bricks to the volume ? As per an earlier email you have already run a fix-layout.
Yes, we added new bricks to the volume and we already executed fix-layout before.
Post by Ashish Pandey
On s04, please check the rebalance log file to see why the rebalance failed.
On s04, rebalance failed after the following errors (before these lines no errors were found):

[2018-10-06 00:13:37.359634] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-tier2-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=2 overlaps=0
[2018-10-06 00:13:37.362424] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.362504] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.362525] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.363105] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.363163] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.363180] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.364920] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.364969] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.364985] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.366864] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.366912] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.366926] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.374818] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.374866] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.374879] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.406076] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.406145] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.406183] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.039835] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.039911] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-11, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.039944] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.039958] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.040441] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.040480] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-7, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.040518] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.040534] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.061789] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.061830] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-9, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.061859] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.061873] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.062283] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.062323] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-8, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.062353] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.062367] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.064613] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.064655] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-6, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.064685] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.064700] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.064727] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.064766] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-10, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.064794] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.064815] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.695948] I [dht-rebalance.c:4512:gf_defrag_start_crawl] 0-tier2-dht: gf_defrag_start_crawl using commit hash 3720343841
[2018-10-06 00:13:53.696837] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.696906] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.696924] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.697549] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.697599] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.697620] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.704120] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.704262] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.704342] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.707260] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.707312] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.707329] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.718301] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.718350] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.718367] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:13:55.626130] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:13:55.626207] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:55.626228] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-06 00:13:55.626231] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr] 0-tier2-dht: fixing the layout of /
[2018-10-06 00:13:55.862374] I [dht-rebalance.c:5063:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total time to complete (size)= 0 seconds, seconds left = 0
[2018-10-06 00:13:55.862440] I [MSGID: 109028] [dht-rebalance.c:5143:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 20.00 secs
[2018-10-06 00:13:55.862460] I [MSGID: 109028] [dht-rebalance.c:5147:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0, skipped: 0
[2018-10-06 00:14:12.476927] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.477020] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-11, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.477077] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.477094] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.477644] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.477695] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-7, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.477726] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.477740] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.477853] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.477894] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-8, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.477923] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.477937] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.486862] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.486902] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-6, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.486929] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.486944] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.493872] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.493912] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-10, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.493939] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.493954] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.494560] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.494598] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-9, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.494624] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.494640] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.795320] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.795366] E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-tier2-dht: getxattr err for dir [Input/output error]
[2018-10-06 00:14:12.795796] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.795834] E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-tier2-dht: getxattr err for dir [Input/output error]
[2018-10-06 00:14:12.804770] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.804803] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.804811] E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-tier2-dht: getxattr err for dir [Input/output error]
[2018-10-06 00:14:12.804850] E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-tier2-dht: getxattr err for dir [Input/output error]
[2018-10-06 00:14:12.808500] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.808563] E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-tier2-dht: getxattr err for dir [Input/output error]
[2018-10-06 00:14:12.812431] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.812468] E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-tier2-dht: getxattr err for dir [Input/output error]
[2018-10-06 00:14:12.812497] E [MSGID: 0] [dht-rebalance.c:4336:dht_get_local_subvols_and_nodeuuids] 0-tier2-dht: local subvolume determination failed with error: 5 [Input/output error]
[2018-10-06 00:14:12.812700] I [MSGID: 109028] [dht-rebalance.c:5143:gf_defrag_status_get] 0-tier2-dht: Rebalance is failed. Time taken is 37.00 secs
[2018-10-06 00:14:12.812720] I [MSGID: 109028] [dht-rebalance.c:5147:gf_defrag_status_get] 0-tier2-dht: Files migrated: 0, size: 0, lookups: 0, failures: 0, skipped: 0
[2018-10-06 00:14:12.812870] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7efe75d18e25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x5623973d64b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x5623973d632b] ) 0-: received signum (15), shutting down

Regards,
Mauro
Post by Ashish Pandey
Regards,
Nithya
Hi All,
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 551922 20.3TB 2349397 0 61849 in progress 55:25:38
s02-stg 287631 13.2TB 959954 0 30262 in progress 55:25:39
s03-stg 288523 12.7TB 973111 0 30220 in progress 55:25:39
s04-stg 0 0Bytes 0 0 0 failed 0:00:37
s05-stg 0 0Bytes 0 0 0 completed 48:33:03
s06-stg 0 0Bytes 0 0 0 completed 48:33:02
Estimated time left for rebalance to complete : 1023:49:56
volume rebalance: tier2: success
Rebalance is migrating files on s05, s06 servers and on s04 too (although it is marked as failed).
s05 and s06 tasks are completed.
1) it seems that rebalance is moving files, but it is fixing the layout also, is it normal?
2) when rebalance will be completed, what we need to do before return the gluster storage to the users? We have to launch rebalance again in order to involve s04 server too or a fix-layout to eventually fix some error on s04?
Thank you very much,
Mauro
Post by Mauro Tridici
Hi All,
some important updates about the issue mentioned below.
- stop gluster volume
- reboot the servers
- start gluster volume
- change some gluster volume options
- start the rebalance again
gluster volume set tier2 network.ping-timeout 02
gluster volume set all cluster.brick-multiplex on
gluster volume set tier2 cluster.server-quorum-ratio 51%
gluster volume set tier2 cluster.server-quorum-type server
gluster volume set tier2 cluster.quorum-type auto
gluster volume set tier2 network.ping-timeout 42
gluster volume set all cluster.brick-multiplex off
gluster volume set tier2 cluster.server-quorum-ratio none
gluster volume set tier2 cluster.server-quorum-type none
gluster volume set tier2 cluster.quorum-type none
The result was that rebalance starts moving data from s01, s02 ed s03 servers to s05 and s06 servers (the new added ones), but it failed on s04 server after 37 seconds.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 286680 12.6TB 1217960 0 43343 in progress 32:10:24
s02-stg 126291 12.4TB 413077 0 21932 in progress 32:10:25
s03-stg 126516 11.9TB 433014 0 21870 in progress 32:10:25
s04-stg 0 0Bytes 0 0 0 failed 0:00:37
s05-stg 0 0Bytes 0 0 0 in progress 32:10:25
s06-stg 0 0Bytes 0 0 0 in progress 32:10:25
Estimated time left for rebalance to complete : 624:47:48
volume rebalance: tier2: success
When rebalance will be completed, we are planning to re-launch it to try to involve s04 server also.
Do you have some idea about what happened in my previous message and why, now, rebalance it’s running although it’s not involve s04 server?
In attachment the complete tier2-rebalance.log file related to s04 server.
Thank you very much for your help,
Mauro
<tier2-rebalance.log.gz>
Post by Mauro Tridici
Hi All,
since we need to restore gluster storage as soon as possible, we decided to ignore the few files that could be lost and to go ahead.
So we cleaned all bricks content of servers s04, s05 and s06.
gluster peer detach s04
gluster peer detach s05
gluster peer detach s06
gluster peer probe s04
gluster peer probe s05
gluster peer probe s06
gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick s06-stg:/gluster/mnt12/brick force
gluster volume rebalance tier2 fix-layout start
Everything seem to be fine and fix-layout ended.
Node status run time in h:m:s
--------- ----------- ------------
localhost fix-layout completed 12:11:6
s02-stg fix-layout completed 12:11:18
s03-stg fix-layout completed 12:11:12
s04-stg fix-layout completed 12:11:20
s05-stg fix-layout completed 12:11:14
s06-stg fix-layout completed 12:10:47
volume rebalance: tier2: success
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
Brick37: s04-stg:/gluster/mnt1/brick
Brick38: s05-stg:/gluster/mnt1/brick
Brick39: s06-stg:/gluster/mnt1/brick
Brick40: s04-stg:/gluster/mnt2/brick
Brick41: s05-stg:/gluster/mnt2/brick
Brick42: s06-stg:/gluster/mnt2/brick
Brick43: s04-stg:/gluster/mnt3/brick
Brick44: s05-stg:/gluster/mnt3/brick
Brick45: s06-stg:/gluster/mnt3/brick
Brick46: s04-stg:/gluster/mnt4/brick
Brick47: s05-stg:/gluster/mnt4/brick
Brick48: s06-stg:/gluster/mnt4/brick
Brick49: s04-stg:/gluster/mnt5/brick
Brick50: s05-stg:/gluster/mnt5/brick
Brick51: s06-stg:/gluster/mnt5/brick
Brick52: s04-stg:/gluster/mnt6/brick
Brick53: s05-stg:/gluster/mnt6/brick
Brick54: s06-stg:/gluster/mnt6/brick
Brick55: s04-stg:/gluster/mnt7/brick
Brick56: s05-stg:/gluster/mnt7/brick
Brick57: s06-stg:/gluster/mnt7/brick
Brick58: s04-stg:/gluster/mnt8/brick
Brick59: s05-stg:/gluster/mnt8/brick
Brick60: s06-stg:/gluster/mnt8/brick
Brick61: s04-stg:/gluster/mnt9/brick
Brick62: s05-stg:/gluster/mnt9/brick
Brick63: s06-stg:/gluster/mnt9/brick
Brick64: s04-stg:/gluster/mnt10/brick
Brick65: s05-stg:/gluster/mnt10/brick
Brick66: s06-stg:/gluster/mnt10/brick
Brick67: s04-stg:/gluster/mnt11/brick
Brick68: s05-stg:/gluster/mnt11/brick
Brick69: s06-stg:/gluster/mnt11/brick
Brick70: s04-stg:/gluster/mnt12/brick
Brick71: s05-stg:/gluster/mnt12/brick
Brick72: s06-stg:/gluster/mnt12/brick
network.ping-timeout: 42
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: none
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io <http://performance.io/>-cache: off
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: none
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: off
cluster.server-quorum-ratio: 51%
[2018-10-05 23:48:38.644978] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-tier2-client-70: Server lk version = 1
[2018-10-05 23:48:44.735323] I [dht-rebalance.c:4512:gf_defrag_start_crawl] 0-tier2-dht: gf_defrag_start_crawl using commit hash 3720331860
[2018-10-05 23:48:44.736205] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736266] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736282] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736377] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736436] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736459] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736460] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736537] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736571] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736574] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.736827] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.736887] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.736904] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-05 23:48:44.740337] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-05 23:48:44.740381] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-05 23:48:44.740394] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-05 23:48:50.066103] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr] 0-tier2-dht: fixing the layout of /
In attachment you can find the first logs captured during the rebalance execution.
In your opinion, is there a way to restore the gluster storage or all the data have been lost?
Thank you in advance,
Mauro
<rebalance_log.txt>
Post by Mauro Tridici
Hi Nithya,
thank you very much.
This is the current “gluster volume info” output after removing bricks (and after peer detach command).
Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (4 + 2) = 36
Transport-type: tcp
Brick1: s01-stg:/gluster/mnt1/brick
Brick2: s02-stg:/gluster/mnt1/brick
Brick3: s03-stg:/gluster/mnt1/brick
Brick4: s01-stg:/gluster/mnt2/brick
Brick5: s02-stg:/gluster/mnt2/brick
Brick6: s03-stg:/gluster/mnt2/brick
Brick7: s01-stg:/gluster/mnt3/brick
Brick8: s02-stg:/gluster/mnt3/brick
Brick9: s03-stg:/gluster/mnt3/brick
Brick10: s01-stg:/gluster/mnt4/brick
Brick11: s02-stg:/gluster/mnt4/brick
Brick12: s03-stg:/gluster/mnt4/brick
Brick13: s01-stg:/gluster/mnt5/brick
Brick14: s02-stg:/gluster/mnt5/brick
Brick15: s03-stg:/gluster/mnt5/brick
Brick16: s01-stg:/gluster/mnt6/brick
Brick17: s02-stg:/gluster/mnt6/brick
Brick18: s03-stg:/gluster/mnt6/brick
Brick19: s01-stg:/gluster/mnt7/brick
Brick20: s02-stg:/gluster/mnt7/brick
Brick21: s03-stg:/gluster/mnt7/brick
Brick22: s01-stg:/gluster/mnt8/brick
Brick23: s02-stg:/gluster/mnt8/brick
Brick24: s03-stg:/gluster/mnt8/brick
Brick25: s01-stg:/gluster/mnt9/brick
Brick26: s02-stg:/gluster/mnt9/brick
Brick27: s03-stg:/gluster/mnt9/brick
Brick28: s01-stg:/gluster/mnt10/brick
Brick29: s02-stg:/gluster/mnt10/brick
Brick30: s03-stg:/gluster/mnt10/brick
Brick31: s01-stg:/gluster/mnt11/brick
Brick32: s02-stg:/gluster/mnt11/brick
Brick33: s03-stg:/gluster/mnt11/brick
Brick34: s01-stg:/gluster/mnt12/brick
Brick35: s02-stg:/gluster/mnt12/brick
Brick36: s03-stg:/gluster/mnt12/brick
network.ping-timeout: 0
features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
performance.client-io-threads: on
cluster.min-free-disk: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.parallel-readdir: off
cluster.readdir-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.io <http://performance.io/>-cache: off
disperse.cpu-extensions: auto
performance.io <http://performance.io/>-thread-count: 16
features.quota-deem-statfs: on
features.default-soft-limit: 90
cluster.server-quorum-type: server
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: on
cluster.server-quorum-ratio: 51%
Regards,
Mauro
Post by Ashish Pandey
Hi Mauro,
The files on s04 and s05 can be deleted safely as long as those bricks have been removed from the volume and their brick processes are not running.
.glusterfs/indices/xattrop/xattrop-* are links to files that need to be healed.
.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008 links to files that bitrot (if enabled)says are corrupted. (none in this case)
I will get back to you on s06. Can you please provide the output of gluster volume info again?
Regards,
Nithya
Dear Ashish, Dear Nithya,
I’m writing this message only to summarize and simplify the information about the "not migrated” files left on removed bricks on server s04, s05 and s06.
In attachment, you can find 3 files (1 file for each server) containing the “not migrated” files lists and related brick number.
s04 and s05 bricks contain only not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs" (I could delete them, doesn’t it?)
s06 bricks contain
not migrated files in hidden directories “/gluster/mnt#/brick/.glusterfs”;
not migrated files with size equal to 0;
not migrated files with size greater than 0.
I think it was necessary to collect and summarize information to simplify your analysis.
Thank you very much,
Mauro
Ashish Pandey
2018-10-08 09:44:27 UTC
Permalink
Hi Mauro,

What is the status of rebalace now?


Could you please give output of following for all the bricks -
getfattr -m. -d -e hex <root path of athe brick>

You have to go to all the nodes and for all the bricks on that node you have to run above command.
Example: on s01

getfattr -m. -d -e hex /gluster/mnt1/brick
Keep output from one node in one file si that it will be easy to analyze.

---
Ashish

----- Original Message -----

From: "Mauro Tridici" <***@cmcc.it>
To: "Nithya Balachandran" <***@redhat.com>
Cc: "gluster-users" <gluster-***@gluster.org>
Sent: Monday, October 8, 2018 2:27:35 PM
Subject: Re: [Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Hi Nithya,

thank you, my answers are in lines.




Il giorno 08 ott 2018, alle ore 10:43, Nithya Balachandran < ***@redhat.com > ha scritto:

Hi Mauro,

Yes, a rebalance consists of 2 operations for every directory:


1. Fix the layout for the new volume config (newly added or removed bricks)
2. Migrate files to their new hashed subvols based on the new layout

Are you running a rebalance because you added new bricks to the volume ? As per an earlier email you have already run a fix-layout.




Yes, we added new bricks to the volume and we already executed fix-layout before.


<blockquote>

On s04, please check the rebalance log file to see why the rebalance failed.

</blockquote>


On s04, rebalance failed after the following errors (before these lines no errors were found):

[2018-10-06 00:13:37.359634] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-tier2-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=2 overlaps=0
[2018-10-06 00:13:37.362424] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.362504] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.362525] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.363105] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.363163] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.363180] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.364920] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.364969] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.364985] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.366864] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.366912] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.366926] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.374818] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.374866] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.374879] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-06 00:13:37.406076] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:13:37.406145] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:37.406183] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.039835] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.039911] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-11, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.039944] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.039958] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.040441] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.040480] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-7, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.040518] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.040534] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.061789] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.061830] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-9, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.061859] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.061873] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.062283] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.062323] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-8, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.062353] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.062367] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.064613] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.064655] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-6, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.064685] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.064700] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:13:51.064727] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:13:51.064766] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-10, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:13:51.064794] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:51.064815] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.695948] I [dht-rebalance.c:4512:gf_defrag_start_crawl] 0-tier2-dht: gf_defrag_start_crawl using commit hash 3720343841
[2018-10-06 00:13:53.696837] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.696906] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.696924] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.697549] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.697599] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.697620] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.704120] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.704262] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.704342] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.707260] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.707312] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.707329] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:13:53.718301] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:13:53.718350] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:53.718367] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:13:55.626130] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/output error]
[2018-10-06 00:13:55.626207] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:13:55.626228] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-9: Failed to update version and size [Input/output error]
[2018-10-06 00:13:55.626231] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr] 0-tier2-dht: fixing the layout of /
[2018-10-06 00:13:55.862374] I [dht-rebalance.c:5063:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total time to complete (size)= 0 seconds, seconds left = 0
[2018-10-06 00:13:55.862440] I [MSGID: 109028] [dht-rebalance.c:5143:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 20.00 secs
[2018-10-06 00:13:55.862460] I [MSGID: 109028] [dht-rebalance.c:5147:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0, skipped: 0
[2018-10-06 00:14:12.476927] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.477020] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-11, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.477077] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-11: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.477094] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-11: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.477644] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.477695] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-7, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.477726] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-7: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.477740] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-7: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.477853] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.477894] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-8, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.477923] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-8: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.477937] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-8: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.486862] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.486902] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-6, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.486929] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-6: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.486944] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-6: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.493872] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to get size and version [Input/output error]
[2018-10-06 00:14:12.493912] E [MSGID: 109006] [dht-selfheal.c:673:dht_selfheal_dir_xattr_cbk] 0-tier2-dht: layout setxattr failed on tier2-disperse-10, path:/ gfid:00000000-0000-0000-0000-000000000001 [Input/output error]
[2018-10-06 00:14:12.493939] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-10: Insufficient available children for this request (have 0, need 4)
[2018-10-06 00:14:12.493954] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done] 0-tier2-disperse-10: Failed to update version and size [Input/output error]
[2018-10-06 00:14:12.494560] W [MSGID: 122040] [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to get size and version [Input/outpu