Discussion:
[Gluster-users] "Solving" a recurrent "performing entry selfheal on [...]" on my bricks
Hoggins!
2018-10-07 16:14:11 UTC
Permalink
Hello list,

My Gluster cluster has a condition, I'd like to know how to cure it.

The setup: two bricks, replicated, with an arbiter.
On brick 1, the /var/log/glusterfs/glustershd.log is quite empty, not
much activity, everything looks fine.
On brick 2, /var/log/glusterfs/glustershd.log shows a lot of these:
    [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
9df5082b-d066-4659-91a4-5f2ad943ce51
    [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
ba8c0409-95f5-499d-8594-c6de15d5a585

These entries are repeated everyday, every ten minutes or so.

Now if we list the contents of the directory represented by file ID
9df5082b-d066-4659-91a4-5f2ad943ce51:
On brick 1:
    drwx------. 2 1005 users 102400 13 sept. 17:03 cur
    -rw-------. 2 1005 users     22 14 mars   2016 dovecot-keywords
    -rw-------. 2 1005 users      0  6 janv.  2015 maildirfolder
    drwx------. 2 1005 users      6 30 juin   2015 new
    drwx------. 2 1005 users      6  4 oct.  17:46 tmp

On brick 2:
    drwx------. 2 1005 users 102400 25 mai   11:00 cur
    -rw-------. 2 1005 users     22 14 mars   2016 dovecot-keywords
    -rw-------. 2 1005 users  80559 25 mai   11:00 dovecot-uidlist
    -rw-------. 2 1005 users      0  6 janv.  2015 maildirfolder
    drwx------. 2 1005 users      6 30 juin   2015 new
    drwx------. 2 1005 users      6  4 oct.  17:46 tmp

(note the "dovecot-uidlist" file present on brick 2 but not on brick 1)

Also, checking directory sizes fur the cur/ directory:
On brick 1:
    165872    cur/

On brick 2:
    161516    cur/

BUT the number of files is the same on the two bricks for the cur/
directory:
    $~ ls -l cur/ | wc -l
    1135

So now you've got it: it's inconsistent between the two data bricks.

On the arbiter, all seems good, the directory listing looks like what is
on brick 2.
Same kind of situation happens for file ID
ba8c0409-95f5-499d-8594-c6de15d5a585.

I'm sure that having this situation is not good and needs to be sorted
out, so what can I do?

Thanks for your help!

    Hoggins!
Vlad Kopylov
2018-10-10 05:05:13 UTC
Permalink
isn't it trying to heal your dovecot-uidlist? try updating, restarting and
initiating heal again

-v
Post by Hoggins!
Hello list,
My Gluster cluster has a condition, I'd like to know how to cure it.
The setup: two bricks, replicated, with an arbiter.
On brick 1, the /var/log/glusterfs/glustershd.log is quite empty, not
much activity, everything looks fine.
[MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
9df5082b-d066-4659-91a4-5f2ad943ce51
[MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
ba8c0409-95f5-499d-8594-c6de15d5a585
These entries are repeated everyday, every ten minutes or so.
Now if we list the contents of the directory represented by file ID
drwx------. 2 1005 users 102400 13 sept. 17:03 cur
-rw-------. 2 1005 users 22 14 mars 2016 dovecot-keywords
-rw-------. 2 1005 users 0 6 janv. 2015 maildirfolder
drwx------. 2 1005 users 6 30 juin 2015 new
drwx------. 2 1005 users 6 4 oct. 17:46 tmp
drwx------. 2 1005 users 102400 25 mai 11:00 cur
-rw-------. 2 1005 users 22 14 mars 2016 dovecot-keywords
-rw-------. 2 1005 users 80559 25 mai 11:00 dovecot-uidlist
-rw-------. 2 1005 users 0 6 janv. 2015 maildirfolder
drwx------. 2 1005 users 6 30 juin 2015 new
drwx------. 2 1005 users 6 4 oct. 17:46 tmp
(note the "dovecot-uidlist" file present on brick 2 but not on brick 1)
165872 cur/
161516 cur/
BUT the number of files is the same on the two bricks for the cur/
$~ ls -l cur/ | wc -l
1135
So now you've got it: it's inconsistent between the two data bricks.
On the arbiter, all seems good, the directory listing looks like what is
on brick 2.
Same kind of situation happens for file ID
ba8c0409-95f5-499d-8594-c6de15d5a585.
I'm sure that having this situation is not good and needs to be sorted
out, so what can I do?
Thanks for your help!
Hoggins!
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Hoggins!
2018-10-12 17:50:14 UTC
Permalink
Well,

It's been doing this for weeks, at least. I hope that by the time the
healing of a simple file like this one would be over.
Besides, the contents of the "cur" directory must also be under healing,
but it takes soooo long it's strange.

    Hoggins!
Post by Vlad Kopylov
isn't it trying to heal your dovecot-uidlist? try updating, restarting
and initiating heal again
-v
Hello list,
My Gluster cluster has a condition, I'd like to know how to cure it.
The setup: two bricks, replicated, with an arbiter.
On brick 1, the /var/log/glusterfs/glustershd.log is quite empty, not
much activity, everything looks fine.
    [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
9df5082b-d066-4659-91a4-5f2ad943ce51
    [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
ba8c0409-95f5-499d-8594-c6de15d5a585
These entries are repeated everyday, every ten minutes or so.
Now if we list the contents of the directory represented by file ID
    drwx------. 2 1005 users 102400 13 sept. 17:03 cur
    -rw-------. 2 1005 users     22 14 mars   2016 dovecot-keywords
    -rw-------. 2 1005 users      0  6 janv.  2015 maildirfolder
    drwx------. 2 1005 users      6 30 juin   2015 new
    drwx------. 2 1005 users      6  4 oct.  17:46 tmp
    drwx------. 2 1005 users 102400 25 mai   11:00 cur
    -rw-------. 2 1005 users     22 14 mars   2016 dovecot-keywords
    -rw-------. 2 1005 users  80559 25 mai   11:00 dovecot-uidlist
    -rw-------. 2 1005 users      0  6 janv.  2015 maildirfolder
    drwx------. 2 1005 users      6 30 juin   2015 new
    drwx------. 2 1005 users      6  4 oct.  17:46 tmp
(note the "dovecot-uidlist" file present on brick 2 but not on brick 1)
    165872    cur/
    161516    cur/
BUT the number of files is the same on the two bricks for the cur/
    $~ ls -l cur/ | wc -l
    1135
So now you've got it: it's inconsistent between the two data bricks.
On the arbiter, all seems good, the directory listing looks like what is
on brick 2.
Same kind of situation happens for file ID
ba8c0409-95f5-499d-8594-c6de15d5a585.
I'm sure that having this situation is not good and needs to be sorted
out, so what can I do?
Thanks for your help!
    Hoggins!
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Vlad Kopylov
2018-10-12 18:55:11 UTC
Permalink
for email, fastest to do would be: update to latest 3.12 on all nodes, make
sure they all communicate properly and create new volume, move data there.
try using volume options from here:
https://lists.gluster.org/pipermail/gluster-users/2018-October/035077.html

-v
Post by Hoggins!
Well,
It's been doing this for weeks, at least. I hope that by the time the
healing of a simple file like this one would be over.
Besides, the contents of the "cur" directory must also be under healing,
but it takes soooo long it's strange.
Hoggins!
Post by Vlad Kopylov
isn't it trying to heal your dovecot-uidlist? try updating, restarting
and initiating heal again
-v
Hello list,
My Gluster cluster has a condition, I'd like to know how to cure it.
The setup: two bricks, replicated, with an arbiter.
On brick 1, the /var/log/glusterfs/glustershd.log is quite empty, not
much activity, everything looks fine.
[MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
9df5082b-d066-4659-91a4-5f2ad943ce51
[MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
ba8c0409-95f5-499d-8594-c6de15d5a585
These entries are repeated everyday, every ten minutes or so.
Now if we list the contents of the directory represented by file ID
drwx------. 2 1005 users 102400 13 sept. 17:03 cur
-rw-------. 2 1005 users 22 14 mars 2016 dovecot-keywords
-rw-------. 2 1005 users 0 6 janv. 2015 maildirfolder
drwx------. 2 1005 users 6 30 juin 2015 new
drwx------. 2 1005 users 6 4 oct. 17:46 tmp
drwx------. 2 1005 users 102400 25 mai 11:00 cur
-rw-------. 2 1005 users 22 14 mars 2016 dovecot-keywords
-rw-------. 2 1005 users 80559 25 mai 11:00 dovecot-uidlist
-rw-------. 2 1005 users 0 6 janv. 2015 maildirfolder
drwx------. 2 1005 users 6 30 juin 2015 new
drwx------. 2 1005 users 6 4 oct. 17:46 tmp
(note the "dovecot-uidlist" file present on brick 2 but not on brick 1)
165872 cur/
161516 cur/
BUT the number of files is the same on the two bricks for the cur/
$~ ls -l cur/ | wc -l
1135
So now you've got it: it's inconsistent between the two data bricks.
On the arbiter, all seems good, the directory listing looks like what is
on brick 2.
Same kind of situation happens for file ID
ba8c0409-95f5-499d-8594-c6de15d5a585.
I'm sure that having this situation is not good and needs to be
sorted
Post by Vlad Kopylov
out, so what can I do?
Thanks for your help!
Hoggins!
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Loading...