Discussion:
[Gluster-users] issue with self-heal
hsafe
2018-07-13 17:55:20 UTC
Permalink
Hello Gluster community,

After several hundred GB of data writes (small image  100k <size> 1M)
into a replicated 2x glusterfs servers , I am facing issue with healing
process. Earlier the heal info returned the bricks and nodes and the
fact that there are no failed heal; but now it gets to the state with
below message:

*# gluster volume heal gv1 info healed*

*Gathering list of heal failed entries on volume gv1 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.*

issuing the heal info command gives a log list of gfid info that takes
like an hour to complete. The file data being images would not change
and primarily served from 8x server mount native glusterfs.

Here is some insight on the status of the gluster, but how can I
effectively do a successful heal on the storages cause last times trying
to do that send the servers southway and irresponsive

*# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
performance.md-cache-timeout: 128
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 50000
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on*

Appreciate your help.Thanks
Brian Andrus
2018-07-13 15:50:28 UTC
Permalink
You message means something (usually glusterfsd) is not running quite
right or at all on one of the servers.

If you can tell which it is, you need to stop/restart glusterd and
glusterfsd. Note: sometimes just stopping them doesn't really stop them.
You need to do a killall -9  for glusterd, glusterfsd and anything else
with "gluster"

Then just start glusterd and glusterfsd. Once they are up you should be
able to do the heal.

If you can't tell which it is and are able to take gluster offline for
users for a moment, do that process to all your brick servers.

Brian Andrus
Post by hsafe
Hello Gluster community,
After several hundred GB of data writes (small image  100k <size> 1M)
into a replicated 2x glusterfs servers , I am facing issue with
healing process. Earlier the heal info returned the bricks and nodes
and the fact that there are no failed heal; but now it gets to the
*# gluster volume heal gv1 info healed*
*Gathering list of heal failed entries on volume gv1 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.*
issuing the heal info command gives a log list of gfid info that takes
like an hour to complete. The file data being images would not change
and primarily served from 8x server mount native glusterfs.
Here is some insight on the status of the gluster, but how can I
effectively do a successful heal on the storages cause last times
trying to do that send the servers southway and irresponsive
*# gluster volume info
Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
performance.md-cache-timeout: 128
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 50000
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on*
Appreciate your help.Thanks
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Loading...