[Gluster-users] design of gluster cluster

Discussion:

Thing

2018-06-12 03:04:14 UTC

Hi,

I would like to understand how gluster works better than I do know and in
particular the architecture.

So I have a test configuration of 6 desktops, each has 2 x 1TB disks in a
raid 0 on an esata channel.

What I would like to do I think is a,

*Distributed-Replicated volume*

a) have 1 and 2 as raid1
b) have 4 and 5 as raid1
c) have 3 and 6 as a raid1
d) join this as concatenation 2+2+2tb

This should in theory give me 6TB as one volume?

I tried to do this and failed as it kept asking for an arbiter, which the
docs simply dont mention how to do.

So say we have,

a) have 1 and 2 as raid1 with 3 as the arbiter?
b) have 4 and 5 as raid 1 with 6 as the arbiter
c) 3 and 6 as a raid 1 with 5 as the arbiter
d) join this as concatenation 2+2+2tb

if so what is the command used to build this?

I want to bias this in particular to do the best it can as VM stores.

Dave Sherohman

2018-06-12 11:10:57 UTC

Permalink

Post by Thing
What I would like to do I think is a,
*Distributed-Replicated volume*
a) have 1 and 2 as raid1
b) have 4 and 5 as raid1
c) have 3 and 6 as a raid1
d) join this as concatenation 2+2+2tb

You probably don't actually want to do that because quorum is handled
separately for each subvolume (bricks 1/2, 4/5, or 3/6), not a single
quorum for the volume as a whole. (Consider if bricks 1 and 2 both went
down. You'd still have 4 of 6 bricks running, so whole-volume quorum
would still be met, but the volume can't continue to run normally since
the first subvolume is completely missing.)

In the specific case of replica 2, gluster treats the first brick in
each subvolume as "slightly more than one", so you'd be able to continue
normally if brick 2, 5, or 6 went down, but, if brick 1, 4, or 3 went
down, all files on that subvolume would become read-only.

Post by Thing
I tried to do this and failed as it kept asking for an arbiter, which the
docs simply dont mention how to do.

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

Post by Thing
So say we have,
a) have 1 and 2 as raid1 with 3 as the arbiter?
b) have 4 and 5 as raid 1 with 6 as the arbiter
c) 3 and 6 as a raid 1 with 5 as the arbiter
d) join this as concatenation 2+2+2tb

I would recommend finding one or more other servers with small amounts
of unused space and allocating the arbiter bricks there, or carving a
gig or two out of your current bricks for that purpose. Arbiters only
need about 4k of disk space per file in the subvolume, regardless of the
actual file size (the arbiter only stores metadata), so TB-sized
arbiters would be a huge waste of space, especially if you're only
putting a few very large files (such as VM disk images) on the volume.

As a real-world data point, I'm using basically the setup you're aiming
for - six data bricks plus three arbiters, used to store VM disk images.
My data bricks are 11T each, while my arbiters are 98G. Disk usage for
the volume is currently at 19%, but all arbiters are under 1% usage (the
largest has 370M used). Assuming my usage patterns don't change, I
could completely fill my 11T subvolumes and only need about 1.5G in the
corresponding arbiters.

Post by Thing
if so what is the command used to build this?

--
Dave Sherohman

Thing

2018-06-12 23:16:32 UTC

Permalink

Hi,

I dont have any more hosts available.

I am a bit lost here, why a replica 3 and arbiter 1? ie not replica2
arbiter1? also no distributed part? is the distributed flag
automatically assumed? with a replica3 then there is a quorum (2 of 3)
so no arbiter is needed? I have this running already like this so I am
assuming its robust?

I am still struggling to undersatnd the syntax, I wish the docs / examples
were better.

So on each gluster node I have an-unused 120gb data1 partition which is
left over from the OS install so the arbiter volume could go here?

in which case?

gluster volume create my-volume replica 2 arbiter 1 host1:/path/to/brick
host2:/path/to/brick (arb-)host3:/path/to/brick2 host4:/path/to/brick
host5:/path/to/brick (arb-)host6:/path/to/brick2 host3:/path/to/brick
host6:/path/to/brick (arb-)host1:/path/to/brick2

is this a sane command?

Otherwise maybe I am beginning to think I am better off doing 3 x 2TB
separate volumes. rather interesting trying to understand this stuff...!

Post by Dave Sherohman

You probably don't actually want to do that because quorum is handled
separately for each subvolume (bricks 1/2, 4/5, or 3/6), not a single
quorum for the volume as a whole. (Consider if bricks 1 and 2 both went
down. You'd still have 4 of 6 bricks running, so whole-volume quorum
would still be met, but the volume can't continue to run normally since
the first subvolume is completely missing.)
In the specific case of replica 2, gluster treats the first brick in
each subvolume as "slightly more than one", so you'd be able to continue
normally if brick 2, 5, or 6 went down, but, if brick 1, 4, or 3 went
down, all files on that subvolume would become read-only.

Post by Thing
I tried to do this and failed as it kept asking for an arbiter, which the
docs simply dont mention how to do.

https://gluster.readthedocs.io/en/latest/Administrator%
20Guide/arbiter-volumes-and-quorum/

I would recommend finding one or more other servers with small amounts
of unused space and allocating the arbiter bricks there, or carving a
gig or two out of your current bricks for that purpose. Arbiters only
need about 4k of disk space per file in the subvolume, regardless of the
actual file size (the arbiter only stores metadata), so TB-sized
arbiters would be a huge waste of space, especially if you're only
putting a few very large files (such as VM disk images) on the volume.
As a real-world data point, I'm using basically the setup you're aiming
for - six data bricks plus three arbiters, used to store VM disk images.
My data bricks are 11T each, while my arbiters are 98G. Disk usage for
the volume is currently at 19%, but all arbiters are under 1% usage (the
largest has 370M used). Assuming my usage patterns don't change, I
could completely fill my 11T subvolumes and only need about 1.5G in the
corresponding arbiters.

Post by Thing
if so what is the command used to build this?

# gluster volume create my-volume replica 3 arbiter 1 host1:/path/to/brick
host2:/path/to/brick arb-host1:/path/to/brick host4:/path/to/brick
host5:/path/to/brick arb-host2:/path/to/brick host3:/path/to/brick
host6:/path/to/brick arb-host3:/path/to/brick
--
Dave Sherohman
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Dave Sherohman

2018-06-13 09:10:34 UTC

Permalink

Post by Thing
I am a bit lost here, why a replica 3 and arbiter 1? ie not replica2
arbiter1?

You'd have to ask the developers about that (I just use gluster, I'm not
a dev). I agree that "replica 2 arbiter 1" seems more intuitive, but I
suppose "replica 3 arbiter 1" could be seen as more technically
accurate, if you look at it as three replicas, but one of the replicas
only stores metadata instead of a full copy.

Post by Thing
also no distributed part? is the distributed flag automatically
assumed?

When you create a replicated volume and specify more bricks than the
replica count, "distributed" is implied because that's the only way that
it would make sense for a replica-2 volume to have 6 bricks. (See the
"To create a distributed replicated volume" section in the docs at
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/
if you'd like to confirm that omitting "distributed" is correct.)

Come to think of it, that might also be why arbiter configurations are
considered to be "replica 3" instead of "replica 2" - you're specifying
bricks in groups of three, so it makes the syntax checking a little
easier if you can just say "number of bricks must be a multiple of
replica count" without adding a special case to increment the replica
count if there's an arbiter also specified.

Post by Thing
with a replica3 then there is a quorum (2 of 3) so no arbiter is
needed?

Correct. Arbiters are only needed (or allowed) in the specific case of
replica 2+arbiter. Per the docs at
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
"Note: Volumes using the arbiter feature can only be replica 3 arbiter 1"

Post by Thing
I have this running already like this so I am assuming its robust?

Yes, it definitely should be! The only disadvantage of replica 3 vs.
replica 2+A is the lower storage capacity.

Post by Thing
So on each gluster node I have an-unused 120gb data1 partition which is
left over from the OS install so the arbiter volume could go here?
in which case?
gluster volume create my-volume replica 2 arbiter 1 host1:/path/to/brick
host2:/path/to/brick (arb-)host3:/path/to/brick2 host4:/path/to/brick
host5:/path/to/brick (arb-)host6:/path/to/brick2 host3:/path/to/brick
host6:/path/to/brick (arb-)host1:/path/to/brick2
is this a sane command?

Yep, looks reasonable to me aside from the "replica 2" needing to be
"replica 3".

--
Dave Sherohman