Discussion:
Feedback and Questions on afr+unify
(too old to reply)
Prabhu Ramachandran
2008-12-18 04:05:30 UTC
Permalink
Hi,

I just installed and configured a couple of machines with glusterfs
(1.4.0-rc3). It seems to work great. Thanks for the amazing software.!
I've been looking for something like this for years.

I have some feedback and questions. My configuration is a bit
complicated. I have two machines each with two disks and each of which
with two partitions that I wanted to use (i.e. 8 partitions across 4
disks on 2 machines). I wanted to expose one single unified filesystem
spanning 4 partitions on each machine and have these replicated via afr
on each machine for high availability. The machines are two desktops
connected on a 10Mbps network.

I exposed the individual partitions, unified them and then the client
volfile simply afr'd these two. Attached are the volfiles for
reference. Here is some feedback:

- This one is very minor. It wasn't explicitly clear from the docs
that to use unify one needed (a) locking and (b) the namespace. The
place this is mentioned is in "Understanding unify translator" which
isn't the first place a user would look. Would be nice if this were
mentioned somewhere.

- There are a lot of options to choose from and without Anands initial
help in person I would be lost trying to choose a scheduler. It would
be great if there were some recommended solutions. I understand the
software is rapidly growing but this would make life easier for new
adopters.

- One of my servers is behind a router with NAT enabled and this
caused problems. I kept getting errors of this kind when trying to
connect from the NATed client (10.24.1.4) to the other one. The logfile
gave me the following:

2008-12-18 00:25:40 E [addr.c:117:gf_auth] auth/addr: client is bound to
port 59327 which is not privilaged
2008-12-18 00:25:40 E [authenticate.c:193:gf_authenticate] auth: no
authentication module is interested in accepting remote-client
10.24.1.4:59327
2008-12-18 00:25:40 E [server-protocol.c:6842:mop_setvolume] server:
Cannot authenticate client from 10.24.1.4:59327

I worked around this problem by exposing the machine as a DMZ host
from the router but this is not ideal. Is there something I can do to
fix this?

- The archives of the list are categorized for each day
(http://gluster.org/pipermail/gluster-users/). This is quite
inconvenient. It would be much more convenient if this were done for
each month.

In addition I have a few questions:

- What would happen if I change the scheduler to something else?
Would that hose the data? I haven't moved all my data yet so I can
experiment currently. I am not likely to tinker with the setup later on
though since this will contain important data.

- What scheduler should I consider using? Anand suggested keeping it
simple and start with rr. Should this be fine for my needs or would it
be better to use hash or alu?

- What would happen if I added another brick, say another disk to the
existing set on one of the machines? Would it break the round-robin
scheduler that I am using? I see from the FAQ that this should work
with the alu but will it work with rr?

Many thanks once again for the awesome clustered file system!

cheers,
--
Prabhu Ramachandran http://www.aero.iitb.ac.in/~prabhu
Krishna Srinivas
2008-12-18 07:57:15 UTC
Permalink
On Thu, Dec 18, 2008 at 9:35 AM, Prabhu Ramachandran
Post by Prabhu Ramachandran
Hi,
I just installed and configured a couple of machines with glusterfs
(1.4.0-rc3). It seems to work great. Thanks for the amazing software.!
I've been looking for something like this for years.
I have some feedback and questions. My configuration is a bit complicated.
I have two machines each with two disks and each of which with two
partitions that I wanted to use (i.e. 8 partitions across 4 disks on 2
machines). I wanted to expose one single unified filesystem spanning 4
partitions on each machine and have these replicated via afr on each machine
for high availability. The machines are two desktops connected on a 10Mbps
network.
I exposed the individual partitions, unified them and then the client
volfile simply afr'd these two. Attached are the volfiles for reference.
- This one is very minor. It wasn't explicitly clear from the docs that to
use unify one needed (a) locking and (b) the namespace. The place this is
mentioned is in "Understanding unify translator" which isn't the first place
a user would look. Would be nice if this were mentioned somewhere.
Unify needs namespace, what do you mean by "locking" here?
Post by Prabhu Ramachandran
- There are a lot of options to choose from and without Anands initial help
in person I would be lost trying to choose a scheduler. It would be great
if there were some recommended solutions. I understand the software is
rapidly growing but this would make life easier for new adopters.
True. We will give "cookbook" recommended setups in our documentation.
Post by Prabhu Ramachandran
- One of my servers is behind a router with NAT enabled and this caused
problems. I kept getting errors of this kind when trying to connect from
the NATed client (10.24.1.4) to the other one. The logfile gave me the
2008-12-18 00:25:40 E [addr.c:117:gf_auth] auth/addr: client is bound to
port 59327 which is not privilaged
2008-12-18 00:25:40 E [authenticate.c:193:gf_authenticate] auth: no
authentication module is interested in accepting remote-client
10.24.1.4:59327
2008-12-18 00:25:40 E [server-protocol.c:6842:mop_setvolume] server: Cannot
authenticate client from 10.24.1.4:59327
I worked around this problem by exposing the machine as a DMZ host from the
router but this is not ideal. Is there something I can do to fix this?
http://www.gluster.org/docs/index.php/GlusterFS_Translators_v1.3#Authenticate_modules

You can use "login" based authentication to get around this problem.
Post by Prabhu Ramachandran
- The archives of the list are categorized for each day
(http://gluster.org/pipermail/gluster-users/). This is quite inconvenient.
It would be much more convenient if this were done for each month.
- What would happen if I change the scheduler to something else? Would that
hose the data? I haven't moved all my data yet so I can experiment
currently. I am not likely to tinker with the setup later on though since
this will contain important data.
Changing schedulers will not cause any problem.
Post by Prabhu Ramachandran
- What scheduler should I consider using? Anand suggested keeping it
simple and start with rr. Should this be fine for my needs or would it be
better to use hash or alu?
- What would happen if I added another brick, say another disk to the
existing set on one of the machines? Would it break the round-robin
scheduler that I am using? I see from the FAQ that this should work with
the alu but will it work with rr?
ALU will be useful when you add servers later. i.e it will see free
diskspace and schedule creation of new files there. RR will just
round-robin.

You can experiement with the new "DHT" translator in place of unify.
You can go with a more standard setup of
unify->afr->client instead of afr->unify->client

When you have afr over unify and one of unify subvol goes down, afr's
selfheal will create missing files which you don't want to happen.

Krishna
Post by Prabhu Ramachandran
Many thanks once again for the awesome clustered file system!
cheers,
--
Prabhu Ramachandran http://www.aero.iitb.ac.in/~prabhu
# Unify four partitions.
# Declare the storage directories.
volume posix1
type storage/posix
option directory /export/sda6/export
end-volume
volume posix2
type storage/posix
option directory /export/sda7/export
end-volume
volume posix3
type storage/posix
option directory /export/sdb5/export
end-volume
volume posix4
type storage/posix
option directory /export/sdb6/export
end-volume
# The namespace storage.
volume posix-ns
type storage/posix
option directory /export/sdb6/export-ns
end-volume
# The locks for the storage to create bricks.
volume brick1
type features/posix-locks
option mandatory on # enables mandatory locking on all files
subvolumes posix1
end-volume
volume brick2
type features/posix-locks
option mandatory on # enables mandatory locking on all files
subvolumes posix2
end-volume
volume brick3
type features/posix-locks
option mandatory on # enables mandatory locking on all files
subvolumes posix3
end-volume
volume brick4
type features/posix-locks
option mandatory on # enables mandatory locking on all files
subvolumes posix4
end-volume
volume brick-ns
type features/posix-locks
option mandatory on # enables mandatory locking on all files
subvolumes posix-ns
end-volume
# Now unify the bricks.
volume unify
type cluster/unify
option namespace brick-ns
subvolumes brick1 brick2 brick3 brick4
option scheduler rr
end-volume
# Serve the unified brick on the network.
volume server
type protocol/server
option transport-type tcp/server
subvolumes unify
option auth.addr.unify.allow 10.101.5.32,127.0.0.1,10.24.1.4
end-volume
# The unified servers: 1 and 2.
volume server1
type protocol/client
option transport-type tcp/client
option remote-host 10.24.1.4
option remote-subvolume unify
end-volume
volume server2
type protocol/client
option transport-type tcp/client
option remote-host 10.101.5.32
option remote-subvolume unify
end-volume
# AFR the two servers.
volume afr
type cluster/afr
subvolumes server1 server2
end-volume
_______________________________________________
Gluster-users mailing list
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Prabhu Ramachandran
2008-12-18 12:04:54 UTC
Permalink
Hi,

Thanks for the response.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- This one is very minor. It wasn't explicitly clear from the docs that to
use unify one needed (a) locking and (b) the namespace. The place this is
mentioned is in "Understanding unify translator" which isn't the first place
a user would look. Would be nice if this were mentioned somewhere.
Unify needs namespace, what do you mean by "locking" here?
The fact that I need to turn the posix-locks feature on.

volume brick1
type features/posix-locks
option mandatory on # enables mandatory locking on all files
subvolumes posix1
end-volume

Without it I was running into problems.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- There are a lot of options to choose from and without Anands initial help
in person I would be lost trying to choose a scheduler. It would be great
if there were some recommended solutions. I understand the software is
rapidly growing but this would make life easier for new adopters.
True. We will give "cookbook" recommended setups in our documentation.
That would be great!
Post by Krishna Srinivas
Post by Prabhu Ramachandran
2008-12-18 00:25:40 E [addr.c:117:gf_auth] auth/addr: client is bound to
port 59327 which is not privilaged
2008-12-18 00:25:40 E [authenticate.c:193:gf_authenticate] auth: no
authentication module is interested in accepting remote-client
10.24.1.4:59327
2008-12-18 00:25:40 E [server-protocol.c:6842:mop_setvolume] server: Cannot
authenticate client from 10.24.1.4:59327
I worked around this problem by exposing the machine as a DMZ host from the
router but this is not ideal. Is there something I can do to fix this?
http://www.gluster.org/docs/index.php/GlusterFS_Translators_v1.3#Authenticate_modules
You can use "login" based authentication to get around this problem.
Thanks, yes, that would work but for some reason I feel a
username/password is weaker than restricting it to an IP.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- What would happen if I change the scheduler to something else? Would that
hose the data? I haven't moved all my data yet so I can experiment
currently. I am not likely to tinker with the setup later on though since
this will contain important data.
Changing schedulers will not cause any problem.
Interesting, so any existing data on the disks will not be affected?
How does that work? Does this mean I can fill the disks a-priori before
unifying them?
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- What would happen if I added another brick, say another disk to the
existing set on one of the machines? Would it break the round-robin
scheduler that I am using? I see from the FAQ that this should work with
the alu but will it work with rr?
ALU will be useful when you add servers later. i.e it will see free
diskspace and schedule creation of new files there. RR will just
round-robin.
You can experiement with the new "DHT" translator in place of unify.
OK, I can do that, I see dht does not need the extra namespace disk, so
I guess I can clear out that directory.
Post by Krishna Srinivas
You can go with a more standard setup of
unify->afr->client instead of afr->unify->client
Sorry, I am not sure what you mean by the above, the arrow directions
aren't completely clear to me. My understanding of the setup currently
is that I have unify->afr->client (I unify 4 partitions on one machine,
afr them across machines and then mount that as a client) which is what
you have mentioned too above. I am confused now that you mention that I
have afr->unify->client instead.
Post by Krishna Srinivas
When you have afr over unify and one of unify subvol goes down, afr's
selfheal will create missing files which you don't want to happen.
OK, so you are saying I need to simply switch to using dht, remove the
namespace directory and continue with the setup and this problem will
not occur?

cheers,
prabhu
Krishna Srinivas
2008-12-18 14:01:56 UTC
Permalink
On Thu, Dec 18, 2008 at 5:34 PM, Prabhu Ramachandran
Post by Prabhu Ramachandran
Hi,
Thanks for the response.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- This one is very minor. It wasn't explicitly clear from the docs that to
use unify one needed (a) locking and (b) the namespace. The place this is
mentioned is in "Understanding unify translator" which isn't the first place
a user would look. Would be nice if this were mentioned somewhere.
Unify needs namespace, what do you mean by "locking" here?
The fact that I need to turn the posix-locks feature on.
volume brick1
type features/posix-locks
option mandatory on # enables mandatory locking on all files
subvolumes posix1
end-volume
Without it I was running into problems.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- There are a lot of options to choose from and without Anands initial help
in person I would be lost trying to choose a scheduler. It would be great
if there were some recommended solutions. I understand the software is
rapidly growing but this would make life easier for new adopters.
True. We will give "cookbook" recommended setups in our documentation.
That would be great!
Post by Krishna Srinivas
Post by Prabhu Ramachandran
2008-12-18 00:25:40 E [addr.c:117:gf_auth] auth/addr: client is bound to
port 59327 which is not privilaged
2008-12-18 00:25:40 E [authenticate.c:193:gf_authenticate] auth: no
authentication module is interested in accepting remote-client
10.24.1.4:59327
2008-12-18 00:25:40 E [server-protocol.c:6842:mop_setvolume] server: Cannot
authenticate client from 10.24.1.4:59327
I worked around this problem by exposing the machine as a DMZ host from the
router but this is not ideal. Is there something I can do to fix this?
http://www.gluster.org/docs/index.php/GlusterFS_Translators_v1.3#Authenticate_modules
You can use "login" based authentication to get around this problem.
Thanks, yes, that would work but for some reason I feel a username/password
is weaker than restricting it to an IP.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- What would happen if I change the scheduler to something else? Would that
hose the data? I haven't moved all my data yet so I can experiment
currently. I am not likely to tinker with the setup later on though since
this will contain important data.
Changing schedulers will not cause any problem.
Interesting, so any existing data on the disks will not be affected? How
does that work? Does this mean I can fill the disks a-priori before
unifying them?
Unify checks with all the subvols on where it is before opening.
Yes you can have pre-filled data before unifying them.
Post by Prabhu Ramachandran
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- What would happen if I added another brick, say another disk to the
existing set on one of the machines? Would it break the round-robin
scheduler that I am using? I see from the FAQ that this should work with
the alu but will it work with rr?
ALU will be useful when you add servers later. i.e it will see free
diskspace and schedule creation of new files there. RR will just
round-robin.
You can experiement with the new "DHT" translator in place of unify.
OK, I can do that, I see dht does not need the extra namespace disk, so I
guess I can clear out that directory.
Note that DHT is a separate cluster xlator and not scheduler.
Post by Prabhu Ramachandran
Post by Krishna Srinivas
You can go with a more standard setup of
unify->afr->client instead of afr->unify->client
Sorry, I am not sure what you mean by the above, the arrow directions aren't
completely clear to me. My understanding of the setup currently is that I
have unify->afr->client (I unify 4 partitions on one machine, afr them
across machines and then mount that as a client) which is what you have
mentioned too above. I am confused now that you mention that I have
afr->unify->client instead.
I was using a different point of view :) (it did not match your setup
exactly though)

You are using AFR over unify, The problem is when one of the servers
on unify goes down AFR will see missing files and recreate them, so
when the downed server comes back up unify will see those files on two
subvols.
Post by Prabhu Ramachandran
Post by Krishna Srinivas
When you have afr over unify and one of unify subvol goes down, afr's
selfheal will create missing files which you don't want to happen.
OK, so you are saying I need to simply switch to using dht, remove the
namespace directory and continue with the setup and this problem will not
occur?
No, DHT is not related to this.
Post by Prabhu Ramachandran
cheers,
prabhu
Prabhu Ramachandran
2008-12-18 17:24:51 UTC
Permalink
Hi,
Post by Krishna Srinivas
Post by Prabhu Ramachandran
Interesting, so any existing data on the disks will not be affected? How
does that work? Does this mean I can fill the disks a-priori before
unifying them?
Unify checks with all the subvols on where it is before opening.
Yes you can have pre-filled data before unifying them.
OK, thanks for the clarification.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
Post by Krishna Srinivas
Post by Prabhu Ramachandran
- What would happen if I added another brick, say another disk to the
existing set on one of the machines? Would it break the round-robin
scheduler that I am using? I see from the FAQ that this should work with
the alu but will it work with rr?
ALU will be useful when you add servers later. i.e it will see free
diskspace and schedule creation of new files there. RR will just
round-robin.
You can experiement with the new "DHT" translator in place of unify.
OK, I can do that, I see dht does not need the extra namespace disk, so I
guess I can clear out that directory.
Note that DHT is a separate cluster xlator and not scheduler.
Sorry, I understand and messed up the terminology.
Post by Krishna Srinivas
You are using AFR over unify, The problem is when one of the servers
on unify goes down AFR will see missing files and recreate them, so
when the downed server comes back up unify will see those files on two
subvols.
OK. So this will persist whether I use dht or unify, right? So what is
the way out of this problem.
Post by Krishna Srinivas
Post by Prabhu Ramachandran
Post by Krishna Srinivas
When you have afr over unify and one of unify subvol goes down, afr's
selfheal will create missing files which you don't want to happen.
OK, so you are saying I need to simply switch to using dht, remove the
namespace directory and continue with the setup and this problem will not
occur?
No, DHT is not related to this.
OK, sorry to be slow but now I am confused. What should I do to my
setup to not face these problems?

Thanks.

cheers,
prabhu

Daniel Maher
2008-12-18 10:08:16 UTC
Permalink
Post by Prabhu Ramachandran
- The archives of the list are categorized for each day
(http://gluster.org/pipermail/gluster-users/). This is quite
inconvenient. It would be much more convenient if this were done for
each month.
This would be great. Also potentially awesome : a search feature. :)
--
Daniel Maher <dma+gluster AT witbe DOT net>
Prabhu Ramachandran
2008-12-18 12:07:09 UTC
Permalink
Post by Daniel Maher
Post by Prabhu Ramachandran
- The archives of the list are categorized for each day
(http://gluster.org/pipermail/gluster-users/). This is quite
inconvenient. It would be much more convenient if this were done for
each month.
This would be great. Also potentially awesome : a search feature. :)
Easiest solution for both would be to subscribe the list to gmane:

http://gmane.org

http://gmane.org/subscribe.php

cheers,
prabhu
Continue reading on narkive:
Loading...