Discussion:
[Gluster-users] Wrong volume size for distributed dispersed volume on 4.1.5
j***@mail.de
2018-10-16 12:50:33 UTC
Permalink
Hi everybody,

I have created a distributed dispersed volume on 4.1.5 under centos7 like this a few days ago:

gluster volume create data_vol1 disperse-data 4 redundancy 2 transport tcp \
\
gf-p-d-01.isec.foobar.com:/bricks/brick1/brick \
gf-p-d-03.isec.foobar.com:/bricks/brick1/brick \
gf-p-d-04.isec.foobar.com:/bricks/brick1/brick \
gf-p-k-01.isec.foobar.com:/bricks/brick1/brick \
gf-p-k-03.isec.foobar.com:/bricks/brick1/brick \
gf-p-k-04.isec.foobar.com:/bricks/brick1/brick \
\
gf-p-d-01.isec.foobar.com:/bricks/brick2/brick \
gf-p-d-03.isec.foobar.com:/bricks/brick2/brick \
gf-p-d-04.isec.foobar.com:/bricks/brick2/brick \
gf-p-k-01.isec.foobar.com:/bricks/brick2/brick \
gf-p-k-03.isec.foobar.com:/bricks/brick2/brick \
gf-p-k-04.isec.foobar.com:/bricks/brick2/brick \
\
... same for brick3 to brick9...
\
gf-p-d-01.isec.foobar.com:/bricks/brick10/brick \
gf-p-d-03.isec.foobar.com:/bricks/brick10/brick \
gf-p-d-04.isec.foobar.com:/bricks/brick10/brick \
gf-p-k-01.isec.foobar.com:/bricks/brick10/brick \
gf-p-k-03.isec.foobar.com:/bricks/brick10/brick \
gf-p-k-04.isec.foobar.com:/bricks/brick10/brick

This worked nicely and resulted in the following filesystem:
[***@gf-p-d-01 ~]# df -h /data/
Dateisystem Größe Benutzt Verf. Verw% Eingehängt auf
gf-p-d-01.isec.foobar.com:/data_vol1 219T 2,2T 217T 2% /data

Each of the bricks resides on its own 6TB disk with 1 big partition formated with xfs.

Yesterday a colleague looked at the filesystem and found some space missing...
[***@gf-p-d-01 ~]# df -h /data/
Filesystem Size Used Avail Use% Mounted on
gf-p-d-01.isec.foobar.com:/data_vol1 22T 272G 22T 2% /data

Some googling brought the following bug report against 3.4 which looks familiar:

https://bugzilla.redhat.com/show_bug.cgi?id=1541830

So we did a quick grep shared-brick-count /var/lib/glusterd/vols/data_vol1/* on all boxes and found that on 5 out of 6 boxes this was shared-brick-count=0 for all bricks on remote boxes and 1 for local bricks.

Is this the expected result or should we have all 1 everywhere (as the quick fix script from the case sets it)?

Also on one box (the one where we created the volume from, btw) we have shared-brick-count=0 for all remote bricks and 10 for the local bricks.
Is it possible that the bug from 3.4 still exists in 4.1.5 and should we try the filter script which sets shared-brick-count=1 for all bricks?

The volume is not currently in production so now would be the time to play around and find the problem...

TIA and regards,

Joachim


-------------------------------------------------------------------------------------------------
FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT
Nithya Balachandran
2018-10-16 14:13:30 UTC
Permalink
Hi,
Post by j***@mail.de
Hi everybody,
I have created a distributed dispersed volume on 4.1.5 under centos7 like
gluster volume create data_vol1 disperse-data 4 redundancy 2 transport tcp \
\
gf-p-d-01.isec.foobar.com:/bricks/brick1/brick \
gf-p-d-03.isec.foobar.com:/bricks/brick1/brick \
gf-p-d-04.isec.foobar.com:/bricks/brick1/brick \
gf-p-k-01.isec.foobar.com:/bricks/brick1/brick \
gf-p-k-03.isec.foobar.com:/bricks/brick1/brick \
gf-p-k-04.isec.foobar.com:/bricks/brick1/brick \
\
gf-p-d-01.isec.foobar.com:/bricks/brick2/brick \
gf-p-d-03.isec.foobar.com:/bricks/brick2/brick \
gf-p-d-04.isec.foobar.com:/bricks/brick2/brick \
gf-p-k-01.isec.foobar.com:/bricks/brick2/brick \
gf-p-k-03.isec.foobar.com:/bricks/brick2/brick \
gf-p-k-04.isec.foobar.com:/bricks/brick2/brick \
\
... same for brick3 to brick9...
\
gf-p-d-01.isec.foobar.com:/bricks/brick10/brick \
gf-p-d-03.isec.foobar.com:/bricks/brick10/brick \
gf-p-d-04.isec.foobar.com:/bricks/brick10/brick \
gf-p-k-01.isec.foobar.com:/bricks/brick10/brick \
gf-p-k-03.isec.foobar.com:/bricks/brick10/brick \
gf-p-k-04.isec.foobar.com:/bricks/brick10/brick
Dateisystem Größe Benutzt Verf. Verw% EingehÀngt auf
gf-p-d-01.isec.foobar.com:/data_vol1 219T 2,2T 217T 2% /data
Each of the bricks resides on its own 6TB disk with 1 big partition formated with xfs.
Yesterday a colleague looked at the filesystem and found some space missing...
Filesystem Size Used Avail Use% Mounted on
gf-p-d-01.isec.foobar.com:/data_vol1 22T 272G 22T 2% /data
https://bugzilla.redhat.com/show_bug.cgi?id=1541830
So we did a quick grep shared-brick-count /var/lib/glusterd/vols/data_vol1/*
on all boxes and found that on 5 out of 6 boxes this was
shared-brick-count=0 for all bricks on remote boxes and 1 for local bricks.
Is this the expected result or should we have all 1 everywhere (as the
quick fix script from the case sets it)?
No , this is fine. The shared-brick-count only needs to be 1 for the local
bricks. The value for the remote bricks can be 0.
Post by j***@mail.de
Also on one box (the one where we created the volume from, btw) we have
shared-brick-count=0 for all remote bricks and 10 for the local bricks.
This is a problem. The shared-brick-count should be 1 for the local bricks
here as well.
Post by j***@mail.de
Is it possible that the bug from 3.4 still exists in 4.1.5 and should we
try the filter script which sets shared-brick-count=1 for all bricks?
Can you try
1. restarting glusterd on all the nodes one after another (not at the same
time)
2. Setting a volume option (say gluster volume set <volname>
cluster.min-free-disk 11%)

and see if it fixes the issue?

Regards,
Nithya
Post by j***@mail.de
The volume is not currently in production so now would be the time to play
around and find the problem...
TIA and regards,
Joachim
------------------------------------------------------------
-------------------------------------
FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
j***@mail.de
2018-10-16 14:34:01 UTC
Permalink
Hi,
Post by j***@mail.de
So we did a quick grep shared-brick-count /var/lib/glusterd/vols/data_vol1/* on all boxes and found that on 5 out of 6 boxes this was shared-brick-count=0 for all bricks on remote boxes and 1 for local bricks.
Is this the expected result or should we have all 1 everywhere (as the quick fix script from the case sets it)?
No , this is fine. The shared-brick-count only needs to be 1 for the local bricks. The value for the remote bricks can be 0.
 
Post by j***@mail.de
Also on one box (the one where we created the volume from, btw) we have shared-brick-count=0 for all remote bricks and 10 for the local bricks.
This is a problem. The shared-brick-count should be 1 for the local bricks here as well.
 
Post by j***@mail.de
Is it possible that the bug from 3.4 still exists in 4.1.5 and should we try the filter script which sets shared-brick-count=1 for all bricks?
Can you try 
1. restarting glusterd on all the nodes one after another (not at the same time)
2. Setting a volume option (say gluster volume set <volname> cluster.min-free-disk 11%) 
and see if it fixes the issue?
Hi,

ok, this was a quick fix - volume size is correct again and the shared-brick-count is correct everywhere.

We'll duly note this in our wiki.

Thanks a lot!

Joachim
-------------------------------------------------------------------------------------------------
FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT
Nithya Balachandran
2018-10-16 14:46:50 UTC
Permalink
Post by j***@mail.de
Hi,
Post by Nithya Balachandran
Post by j***@mail.de
So we did a quick grep shared-brick-count /var/lib/glusterd/vols/data_vol1/*
on all boxes and found that on 5 out of 6 boxes this was
shared-brick-count=0 for all bricks on remote boxes and 1 for local bricks.
Post by Nithya Balachandran
Post by j***@mail.de
Is this the expected result or should we have all 1 everywhere (as the
quick fix script from the case sets it)?
Post by Nithya Balachandran
No , this is fine. The shared-brick-count only needs to be 1 for the
local bricks. The value for the remote bricks can be 0.
Post by Nithya Balachandran
Post by j***@mail.de
Also on one box (the one where we created the volume from, btw) we
have shared-brick-count=0 for all remote bricks and 10 for the local bricks.
Post by Nithya Balachandran
This is a problem. The shared-brick-count should be 1 for the local
bricks here as well.
Post by Nithya Balachandran
Post by j***@mail.de
Is it possible that the bug from 3.4 still exists in 4.1.5 and should
we try the filter script which sets shared-brick-count=1 for all bricks?
Post by Nithya Balachandran
Can you try
1. restarting glusterd on all the nodes one after another (not at the
same time)
Post by Nithya Balachandran
2. Setting a volume option (say gluster volume set <volname>
cluster.min-free-disk 11%)
Post by Nithya Balachandran
and see if it fixes the issue?
Hi,
ok, this was a quick fix - volume size is correct again and the
shared-brick-count is correct everywhere.
We'll duly note this in our wiki.
Thanks a lot!
If there were any directories created on the volume when the sizes were
wrong, the layouts sets on them are probably incorrect. You might want to
do a fix-layout on the volume.

Regards,
Nithya
Post by j***@mail.de
Joachim
------------------------------------------------------------
-------------------------------------
FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT
j***@mail.de
2018-10-16 14:59:48 UTC
Permalink
If there were any directories created on the volume when the sizes were wrong, the layouts sets on them are probably incorrect. You might want to do a fix-layout on the volume.
Hi,

there wasn't any write activity on the volume lately (we just had done some crash testing with the volume while writing on it - like booting two nodes - no problem or three nodes - volume went offline but healed itself).

I did a gluster volume rebalance data_vol1 fix-layout start for good measure and the wiki which finished instantly.

Thanks again,

Joachim
-------------------------------------------------------------------------------------------------
FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT
Loading...