Hoggins!
2018-10-07 16:14:11 UTC
Hello list,
My Gluster cluster has a condition, I'd like to know how to cure it.
The setup: two bricks, replicated, with an arbiter.
On brick 1, the /var/log/glusterfs/glustershd.log is quite empty, not
much activity, everything looks fine.
On brick 2, /var/log/glusterfs/glustershd.log shows a lot of these:
   [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
9df5082b-d066-4659-91a4-5f2ad943ce51
   [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
ba8c0409-95f5-499d-8594-c6de15d5a585
These entries are repeated everyday, every ten minutes or so.
Now if we list the contents of the directory represented by file ID
9df5082b-d066-4659-91a4-5f2ad943ce51:
On brick 1:
   drwx------. 2 1005 users 102400 13 sept. 17:03 cur
   -rw-------. 2 1005 users    22 14 mars  2016 dovecot-keywords
   -rw-------. 2 1005 users     0 6 janv. 2015 maildirfolder
   drwx------. 2 1005 users     6 30 juin  2015 new
   drwx------. 2 1005 users     6 4 oct. 17:46 tmp
On brick 2:
   drwx------. 2 1005 users 102400 25 mai  11:00 cur
   -rw-------. 2 1005 users    22 14 mars  2016 dovecot-keywords
   -rw-------. 2 1005 users 80559 25 mai  11:00 dovecot-uidlist
   -rw-------. 2 1005 users     0 6 janv. 2015 maildirfolder
   drwx------. 2 1005 users     6 30 juin  2015 new
   drwx------. 2 1005 users     6 4 oct. 17:46 tmp
(note the "dovecot-uidlist" file present on brick 2 but not on brick 1)
Also, checking directory sizes fur the cur/ directory:
On brick 1:
   165872   cur/
On brick 2:
   161516   cur/
BUT the number of files is the same on the two bricks for the cur/
directory:
   $~ ls -l cur/ | wc -l
   1135
So now you've got it: it's inconsistent between the two data bricks.
On the arbiter, all seems good, the directory listing looks like what is
on brick 2.
Same kind of situation happens for file ID
ba8c0409-95f5-499d-8594-c6de15d5a585.
I'm sure that having this situation is not good and needs to be sorted
out, so what can I do?
Thanks for your help!
   Hoggins!
My Gluster cluster has a condition, I'd like to know how to cure it.
The setup: two bricks, replicated, with an arbiter.
On brick 1, the /var/log/glusterfs/glustershd.log is quite empty, not
much activity, everything looks fine.
On brick 2, /var/log/glusterfs/glustershd.log shows a lot of these:
   [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
9df5082b-d066-4659-91a4-5f2ad943ce51
   [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-mailer-replicate-0: performing entry selfheal on
ba8c0409-95f5-499d-8594-c6de15d5a585
These entries are repeated everyday, every ten minutes or so.
Now if we list the contents of the directory represented by file ID
9df5082b-d066-4659-91a4-5f2ad943ce51:
On brick 1:
   drwx------. 2 1005 users 102400 13 sept. 17:03 cur
   -rw-------. 2 1005 users    22 14 mars  2016 dovecot-keywords
   -rw-------. 2 1005 users     0 6 janv. 2015 maildirfolder
   drwx------. 2 1005 users     6 30 juin  2015 new
   drwx------. 2 1005 users     6 4 oct. 17:46 tmp
On brick 2:
   drwx------. 2 1005 users 102400 25 mai  11:00 cur
   -rw-------. 2 1005 users    22 14 mars  2016 dovecot-keywords
   -rw-------. 2 1005 users 80559 25 mai  11:00 dovecot-uidlist
   -rw-------. 2 1005 users     0 6 janv. 2015 maildirfolder
   drwx------. 2 1005 users     6 30 juin  2015 new
   drwx------. 2 1005 users     6 4 oct. 17:46 tmp
(note the "dovecot-uidlist" file present on brick 2 but not on brick 1)
Also, checking directory sizes fur the cur/ directory:
On brick 1:
   165872   cur/
On brick 2:
   161516   cur/
BUT the number of files is the same on the two bricks for the cur/
directory:
   $~ ls -l cur/ | wc -l
   1135
So now you've got it: it's inconsistent between the two data bricks.
On the arbiter, all seems good, the directory listing looks like what is
on brick 2.
Same kind of situation happens for file ID
ba8c0409-95f5-499d-8594-c6de15d5a585.
I'm sure that having this situation is not good and needs to be sorted
out, so what can I do?
Thanks for your help!
   Hoggins!