Discussion:
Clarification on common tasks
(too old to reply)
Gandalf Corvotempesta
2016-08-11 09:13:34 UTC
Permalink
I would like to make some clarification on common tasks needed by
gluster administrators.

A) Let's assume a disk/brick is failed (or is going to fail) and I
would like to replace.
Which is the proper way to do so with no data loss or downtime ?

Looking on mailing list, seems to be the following:

1) kill the brick process (how can I ensure which is the brick process
to kill)? I have the following on a test cluster (with just one
brick):
# ps ax -o command | grep gluster
/usr/sbin/glusterfsd -s 1.2.3.112 --volfile-id
gv0.1.2.3.112.export-sdb1-brick -p
/var/lib/glusterd/vols/gv0/run/1.2.3.112-export-sdb1-brick.pid -S
/var/run/gluster/27555a68c738d9841879991c725e92e0.socket --brick-name
/export/sdb1/brick -l /var/log/glusterfs/bricks/export-sdb1-brick.log
--xlator-option
*-posix.glusterd-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8
--brick-port 49152 --xlator-option gv0-server.listen-port=49152
/usr/sbin/glusterd -p /var/run/glusterd.pid
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/5f3713389b19487b6c7d6efca6102987.socket
--xlator-option
*replicate*.node-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8

which is the "brick process" ?

2) unmount the brick, in example:
unmount /dev/sdc

3) remove the failed disk

4) insert the new disk
5) create an XFS filesystem on the new disk
6) mount the new disk where the previous one was
7) add the new brick to the gluster. How ?
8) run "gluster v start force".

Why should I need the step 8? If the volume is already started and
working (remember that I would like to change disk with no downtime,
thus i can't stop the volume), why should I "start" it again ?




B) let's assume I would like to add a bounch of new bricks on existing
servers. Which is the proper procedure to do so?


Ceph has a good documentation page where some common tasks are explained:
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
i've not found anything similiar in gluster.
Lindsay Mathieson
2016-08-11 11:08:13 UTC
Permalink
Post by Gandalf Corvotempesta
1) kill the brick process (how can I ensure which is the brick process
to kill)?
glusterfsd is the prick status

Also "gluster volume status" lists the pid's of all the bricks processes.
Post by Gandalf Corvotempesta
unmount /dev/sdc
3) remove the failed disk
4) insert the new disk
5) create an XFS filesystem on the new disk
6) mount the new disk where the previous one was
Yes to all that.
Post by Gandalf Corvotempesta
7) add the new brick to the gluster. How ?
No need. New brick is mounted where the old one was.
Post by Gandalf Corvotempesta
8) run "gluster v start force".
Yes.
Post by Gandalf Corvotempesta
Why should I need the step 8? If the volume is already started and
working (remember that I would like to change disk with no downtime,
thus i can't stop the volume), why should I "start" it again ?
This forces a restart of the glusterfsd process you killed earlier.

Next you do a :

"gluster heal <VOLUME NAME> full"

That causes the files on the other bricks to be healed to the new brick.
Post by Gandalf Corvotempesta
B) let's assume I would like to add a bounch of new bricks on existing
servers. Which is the proper procedure to do so?
Different process altogether.
Post by Gandalf Corvotempesta
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
i've not found anything similiar in gluster.
That would be good.
--
Lindsay Mathieson
Gandalf Corvotempesta
2016-08-11 13:43:54 UTC
Permalink
Post by Lindsay Mathieson
Also "gluster volume status" lists the pid's of all the bricks processes.
Ok, let's break everything., just to try.

This is a working cluster. I have 3 server with 1 brick each, in
replica 3, thus, all files are replicated on all hosts.

# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 2a36dc0f-1d9b-469c-82de-9d8d98321b83
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 1.2.3.112:/export/sdb1/brick
Brick2: 1.2.3.113:/export/sdb1/brick
Brick3: 1.2.3.114:/export/sdb1/brick
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.shard: off
features.shard-block-size: 10MB
performance.write-behind-window-size: 1GB
performance.cache-size: 1GB


I did this on a client:

# echo 'hello world' > hello
# md5sum hello
6f5902ac237024bdd0c176cb93063dc4 hello

Obviously, on node 1.2.3.112 I have it:

# cat /export/sdb1/brick/hello
hello world
# md5sum /export/sdb1/brick/hello
6f5902ac237024bdd0c176cb93063dc4 /export/sdb1/brick/hello



Let's break everything, this is funny.
I take the brick pid from here:
# gluster volume status | grep 112
Brick 1.2.3.112:/export/sdb1/brick 49152 0 Y 14315


# kill -9 14315

# gluster volume status | grep 112
Brick 1.2.3.112:/export/sdb1/brick N/A N/A N N/A

this should be like a dregraded cluster, right ?

Now I add a new file from the client:
echo "hello world, i'm degraded" > degraded

Obviously, this file is not replicated on node 1.2.3.112

# gluster volume heal gv0 info
Brick 1.2.3.112:/export/sdb1/brick
Status: Transport endpoint is not connected
Number of entries: -

Brick 1.2.3.113:/export/sdb1/brick
/degraded
/
Status: Connected
Number of entries: 2

Brick 1.2.3.114:/export/sdb1/brick
/degraded
/
Status: Connected
Number of entries: 2



This means that "/" dir and "/degraded" file should be healed from
.113 and .114 ?

Let's format the disk on .112
# umount /dev/sdb1
# mkfs.xfs /dev/sdb1 -f
meta-data=/dev/sdb1 isize=256 agcount=4, agsize=122094597 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0


Now I mount it again on the old place:

# mount /dev/sdb1 /export/sdb1

it's empty:
# ls /export/sdb1/ -la
total 4
drwxr-xr-x 2 root root 6 Aug 11 15:37 .
drwxr-xr-x 3 root root 4096 Jul 5 17:03 ..

I create the "brick" directory used by gluster:

# mkdir /export/sdb1/brick

Now I run the volume start force:

# gluster volume start gv0 force
volume start: gv0: success

But brick process is still down:

# gluster volume status | grep 112
Brick 1.2.3.112:/export/sdb1/brick N/A N/A N N/A



And now ?

What I really don't like is the use of "force" in "gluster volume start"
Usually (in all software) force is used when "bad things" are needed.
In this case, the volume start is mandatory, thus why I have to use
the force?
If the volume is already started, gluster should be smart enough to
start only the missing processes, without force, or, better, another
command should be created, something like: "gluster bricks start"
using the force means running dangerous operation, not a common
administrative task.
Joe Julian
2016-08-11 14:34:38 UTC
Permalink
start ... force is, indeed, a dangerous command. If your brick failed to mount, gluster will not find the volume-id extended attribute and will recognize that the path is not a brick and will not start the brick daemon preventing filling up your root partition with replicated files.

When you replace a brick, the new filesystem will not have that attribute so "force" is required to override that safety check.
2016-08-11 13:08 GMT+02:00 Lindsay Mathieson
Post by Lindsay Mathieson
Also "gluster volume status" lists the pid's of all the bricks
processes.
Ok, let's break everything., just to try.
This is a working cluster. I have 3 server with 1 brick each, in
replica 3, thus, all files are replicated on all hosts.
# gluster volume info
Volume Name: gv0
Type: Replicate
Volume ID: 2a36dc0f-1d9b-469c-82de-9d8d98321b83
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: 1.2.3.112:/export/sdb1/brick
Brick2: 1.2.3.113:/export/sdb1/brick
Brick3: 1.2.3.114:/export/sdb1/brick
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.shard: off
features.shard-block-size: 10MB
performance.write-behind-window-size: 1GB
performance.cache-size: 1GB
# echo 'hello world' > hello
# md5sum hello
6f5902ac237024bdd0c176cb93063dc4 hello
# cat /export/sdb1/brick/hello
hello world
# md5sum /export/sdb1/brick/hello
6f5902ac237024bdd0c176cb93063dc4 /export/sdb1/brick/hello
Let's break everything, this is funny.
# gluster volume status | grep 112
Brick 1.2.3.112:/export/sdb1/brick 49152 0 Y
14315
# kill -9 14315
# gluster volume status | grep 112
Brick 1.2.3.112:/export/sdb1/brick N/A N/A N
N/A
this should be like a dregraded cluster, right ?
echo "hello world, i'm degraded" > degraded
Obviously, this file is not replicated on node 1.2.3.112
# gluster volume heal gv0 info
Brick 1.2.3.112:/export/sdb1/brick
Status: Transport endpoint is not connected
Number of entries: -
Brick 1.2.3.113:/export/sdb1/brick
/degraded
/
Status: Connected
Number of entries: 2
Brick 1.2.3.114:/export/sdb1/brick
/degraded
/
Status: Connected
Number of entries: 2
This means that "/" dir and "/degraded" file should be healed from
.113 and .114 ?
Let's format the disk on .112
# umount /dev/sdb1
# mkfs.xfs /dev/sdb1 -f
meta-data=/dev/sdb1 isize=256 agcount=4,
agsize=122094597 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385,
imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
# mount /dev/sdb1 /export/sdb1
# ls /export/sdb1/ -la
total 4
drwxr-xr-x 2 root root 6 Aug 11 15:37 .
drwxr-xr-x 3 root root 4096 Jul 5 17:03 ..
# mkdir /export/sdb1/brick
# gluster volume start gv0 force
volume start: gv0: success
# gluster volume status | grep 112
Brick 1.2.3.112:/export/sdb1/brick N/A N/A N
N/A
And now ?
What I really don't like is the use of "force" in "gluster volume start"
Usually (in all software) force is used when "bad things" are needed.
In this case, the volume start is mandatory, thus why I have to use
the force?
If the volume is already started, gluster should be smart enough to
start only the missing processes, without force, or, better, another
command should be created, something like: "gluster bricks start"
using the force means running dangerous operation, not a common
administrative task.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Gandalf Corvotempesta
2016-08-11 14:42:05 UTC
Permalink
Post by Joe Julian
When you replace a brick, the new filesystem will not have that attribute so "force" is required to override that safety check.
IMHO, as we are not *starting* the volume (is already started)
another command should be used.
If you do a "gluster v status" even with a failed brick, the command
runs properly and output the volume status, because is already
started.
I think that another command would be better to create less confusion
to who is new to gluster.

my 2 cents.
Anuradha Talur
2016-08-11 12:17:12 UTC
Permalink
----- Original Message -----
Sent: Thursday, August 11, 2016 2:43:34 PM
Subject: [Gluster-users] Clarification on common tasks
I would like to make some clarification on common tasks needed by
gluster administrators.
A) Let's assume a disk/brick is failed (or is going to fail) and I
would like to replace.
Which is the proper way to do so with no data loss or downtime ?
1) kill the brick process (how can I ensure which is the brick process
to kill)? I have the following on a test cluster (with just one
# ps ax -o command | grep gluster
/usr/sbin/glusterfsd -s 1.2.3.112 --volfile-id
gv0.1.2.3.112.export-sdb1-brick -p
/var/lib/glusterd/vols/gv0/run/1.2.3.112-export-sdb1-brick.pid -S
/var/run/gluster/27555a68c738d9841879991c725e92e0.socket --brick-name
/export/sdb1/brick -l /var/log/glusterfs/bricks/export-sdb1-brick.log
--xlator-option
*-posix.glusterd-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8
--brick-port 49152 --xlator-option gv0-server.listen-port=49152
/usr/sbin/glusterd -p /var/run/glusterd.pid
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/5f3713389b19487b6c7d6efca6102987.socket
--xlator-option
*replicate*.node-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8
which is the "brick process" ?
As clarified by Lindsay, you can find the correct brick to kill
by mapping output of gluster v status with the brick that has failed.
unmount /dev/sdc
3) remove the failed disk
4) insert the new disk
5) create an XFS filesystem on the new disk
6) mount the new disk where the previous one was
7) add the new brick to the gluster. How ?
8) run "gluster v start force".
If this is a replicate volume then only these steps are not enough.

If you are okay with the mount of new and previous brick to be
different-

After you mount the new-brick, you will have to run
gluster v replace-brick <volname> old_brick new_brick commit force.

By doing this you would be adding new brick to the gluster cluster
and also letting the replicate translator know that
the brick has been replaced and that it needs to be healed.

Once this is done, self-heal-daemon will start the healing process
automatically.

If this step is done, you wouldn't have to run step 8 - gluster v start force.
As replace-brick command takes care of bringing the new brick up.

In case you want to mount the new brick to the same path as the previous one,
then after step 6, I'd suggest you:
a) Create a dummy-non-existent-dir under '/' of the volume's mount point.
b) create a dummy-non-existent-xattr on '/' of the volume's mount point.
The above steps are basically again letting the replicate translator know
that some healing has to be done on the brick that is down. replace-brick
command would do this for you but as it doesn't support same path for old
and new brick, this is a work-around. (Support for replacing bricks with
same path will be provided in upcoming releases. It is being worked on.)

Once this is done, run the replace-brick command mentioned above.
This should add some volume uuids to the brick, start the brick and then
trigger heal to new brick.
Why should I need the step 8? If the volume is already started and
working (remember that I would like to change disk with no downtime,
thus i can't stop the volume), why should I "start" it again ?
B) let's assume I would like to add a bounch of new bricks on existing
servers. Which is the proper procedure to do so?
Do you mean increase the capacity of the volume by adding new bricks?
You can use gluster v add-brick new-brick(s)

The options provided to add-brick are going to vary based on how you plan to
add these bricks (whether you want to increase replica-count or add a new
replica set etc).
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
i've not found anything similiar in gluster.
I found this for glusterFS :
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick

Hope this helps.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
Gandalf Corvotempesta
2016-08-11 13:54:26 UTC
Permalink
Post by Anuradha Talur
If you are okay with the mount of new and previous brick to be
different-
After you mount the new-brick, you will have to run
gluster v replace-brick <volname> old_brick new_brick commit force.
By doing this you would be adding new brick to the gluster cluster
and also letting the replicate translator know that
the brick has been replaced and that it needs to be healed.
Once this is done, self-heal-daemon will start the healing process
automatically.
If this step is done, you wouldn't have to run step 8 - gluster v start force.
As replace-brick command takes care of bringing the new brick up.
This seems to be the easier way to replace a failed brick.
But somewhere (i don't remember exaclty where, but on official docs)
i've seen a brick naming convention like this:

/srv/export/sdb1/brick

where "sdb" is inside the mount point. With this naming convention, is
almost impossible to use
the replace brick method, as the new brick would get the older brick's
path. (if I *replace* "sdb", the new disk is still "sdb")

The current "brick naming convention" is differente like mine and the
disk name is not used anymore in the mount point, thus, replace brick
would be ok.
Gandalf Corvotempesta
2016-08-11 14:05:01 UTC
Permalink
Post by Anuradha Talur
After you mount the new-brick, you will have to run
gluster v replace-brick <volname> old_brick new_brick commit force.
Tried this. Worked immediatly, the brick was replaced and healing is running.

Any wat to show the heal status without dumping the whole file list?
I'm using shards
as test, I have thousands of shard files, the output is very, very
very long and almost unusable.

Something shorter is available? A progress or similiar......
Lindsay Mathieson
2016-08-11 14:09:16 UTC
Permalink
Post by Gandalf Corvotempesta
ny wat to show the heal status without dumping the whole file list?
I'm using shards
as test, I have thousands of shard files, the output is very, very
very long and almost unusable.
Something shorter is available? A progress or similiar......
gluster volume heal <volume name> statistics heal-count



much quicker.
--
Lindsay Mathieson
Gandalf Corvotempesta
2016-08-11 14:13:55 UTC
Permalink
Post by Lindsay Mathieson
gluster volume heal <volume name> statistics heal-count
much quicker.
Perfect. Thanks.

By running "gluster volume heal gv0 statistics" i can see some failed entries:

Starting time of crawl: Thu Aug 11 15:57:44 2016

Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 209
No. of entries in split-brain: 0
No. of heal failed entries: 139


Should I troubleshoot this or is it normal ?

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.
Gandalf Corvotempesta
2016-08-11 21:18:37 UTC
Permalink
2016-08-11 16:13 GMT+02:00 Gandalf Corvotempesta
Post by Gandalf Corvotempesta
# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.
Healing is now complete:

# gluster volume heal gv0 info
Brick 1.2.3.112:/export/brick1/brick
Status: Connected
Number of entries: 0

Brick 1.2.3.113:/export/sdb1/brick
Status: Connected
Number of entries: 0

Brick 1.2.3.114:/export/sdb1/brick
Status: Connected
Number of entries: 0

I remember that there was some errors during healing, so i've tried to run this:

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.

but all bricks are up and running:

# gluster volume status
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 1.2.3.112:/export/brick1/brick 49153 0 Y 11222
Brick 1.2.3.113:/export/sdb1/brick 49152 0 Y 5786
Brick 1.2.3.114:/export/sdb1/brick 49152 0 Y 920
Self-heal Daemon on localhost N/A N/A Y 11227
Self-heal Daemon on 1.2.3.113 N/A N/A Y 31559
Self-heal Daemon on 1.2.3.114 N/A N/A Y 26173

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks



Why i'm unable to see what happened during healing ? I can't use any
of "healed" or "heal-failed" arguments.
"split-brain" works.

Any suggestion? I'm curious to see why some files triggered an heal failure.
Anuradha Talur
2016-08-12 07:11:14 UTC
Permalink
----- Original Message -----
Sent: Friday, August 12, 2016 2:48:37 AM
Subject: Re: [Gluster-users] Clarification on common tasks
2016-08-11 16:13 GMT+02:00 Gandalf Corvotempesta
Post by Gandalf Corvotempesta
# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.
# gluster volume heal gv0 info
Brick 1.2.3.112:/export/brick1/brick
Status: Connected
Number of entries: 0
Brick 1.2.3.113:/export/sdb1/brick
Status: Connected
Number of entries: 0
Brick 1.2.3.114:/export/sdb1/brick
Status: Connected
Number of entries: 0
# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.
Actually info healed and heal-failed are deprecated. Recently some changes
were made due to which wrong error message is being given to the users.
A bug has been raised for the same and will be worked on for the next releases.
# gluster volume status
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 1.2.3.112:/export/brick1/brick 49153 0 Y 11222
Brick 1.2.3.113:/export/sdb1/brick 49152 0 Y 5786
Brick 1.2.3.114:/export/sdb1/brick 49152 0 Y 920
Self-heal Daemon on localhost N/A N/A Y 11227
Self-heal Daemon on 1.2.3.113 N/A N/A Y 31559
Self-heal Daemon on 1.2.3.114 N/A N/A Y 26173
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
Why i'm unable to see what happened during healing ? I can't use any
of "healed" or "heal-failed" arguments.
"split-brain" works.
Any suggestion? I'm curious to see why some files triggered an heal failure.
You can get glustershd logs to see why heals failed. If you need help with it,
please give us the log files.
--
Thanks,
Anuradha.
Gandalf Corvotempesta
2016-08-12 10:44:24 UTC
Permalink
Post by Anuradha Talur
You can get glustershd logs to see why heals failed. If you need help with it,
please give us the log files.
I don't see anything in glustershd logs regarding healing (i'm looking
at logs on node being healed)

Anuradha Talur
2016-08-11 14:13:15 UTC
Permalink
----- Original Message -----
Sent: Thursday, August 11, 2016 5:47:12 PM
Subject: Re: [Gluster-users] Clarification on common tasks
----- Original Message -----
Sent: Thursday, August 11, 2016 2:43:34 PM
Subject: [Gluster-users] Clarification on common tasks
I would like to make some clarification on common tasks needed by
gluster administrators.
A) Let's assume a disk/brick is failed (or is going to fail) and I
would like to replace.
Which is the proper way to do so with no data loss or downtime ?
1) kill the brick process (how can I ensure which is the brick process
to kill)? I have the following on a test cluster (with just one
# ps ax -o command | grep gluster
/usr/sbin/glusterfsd -s 1.2.3.112 --volfile-id
gv0.1.2.3.112.export-sdb1-brick -p
/var/lib/glusterd/vols/gv0/run/1.2.3.112-export-sdb1-brick.pid -S
/var/run/gluster/27555a68c738d9841879991c725e92e0.socket --brick-name
/export/sdb1/brick -l /var/log/glusterfs/bricks/export-sdb1-brick.log
--xlator-option
*-posix.glusterd-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8
--brick-port 49152 --xlator-option gv0-server.listen-port=49152
/usr/sbin/glusterd -p /var/run/glusterd.pid
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/5f3713389b19487b6c7d6efca6102987.socket
--xlator-option
*replicate*.node-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8
which is the "brick process" ?
As clarified by Lindsay, you can find the correct brick to kill
by mapping output of gluster v status with the brick that has failed.
unmount /dev/sdc
3) remove the failed disk
4) insert the new disk
5) create an XFS filesystem on the new disk
6) mount the new disk where the previous one was
7) add the new brick to the gluster. How ?
8) run "gluster v start force".
If this is a replicate volume then only these steps are not enough.
If you are okay with the mount of new and previous brick to be
different-
After you mount the new-brick, you will have to run
gluster v replace-brick <volname> old_brick new_brick commit force.
By doing this you would be adding new brick to the gluster cluster
and also letting the replicate translator know that
the brick has been replaced and that it needs to be healed.
Once this is done, self-heal-daemon will start the healing process
automatically.
If this step is done, you wouldn't have to run step 8 - gluster v start force.
As replace-brick command takes care of bringing the new brick up.
In case you want to mount the new brick to the same path as the previous one,
a) Create a dummy-non-existent-dir under '/' of the volume's mount point.
b) create a dummy-non-existent-xattr on '/' of the volume's mount point.
The above steps are basically again letting the replicate translator know
that some healing has to be done on the brick that is down. replace-brick
command would do this for you but as it doesn't support same path for old
and new brick, this is a work-around. (Support for replacing bricks with
same path will be provided in upcoming releases. It is being worked on.)
Sorry, there was a mistake in this mail.
As I said, replace-brick can't be used when old and new path are the same.
And I by mistake suggested replace-brick after all the steps again!

There was a document that I'm not able to locate right now.
The first step after mounting the brick was to set volume ID using
setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brickpath>.
I think there were more steps, I will update once I find the doc.
Once all the required xattrs are set, gluster v start force was supposed to be done.

start force needs to be done here as volume is already in start state but the
management daemon, glusterd, is not aware that the failed brick has been fixed
with new disks. start force is a way of letting glusterd know that there is
a brick that is down but needs to be started. This will be done without
affecting the existing up bricks.
Once this is done, run the replace-brick command mentioned above.
This should add some volume uuids to the brick, start the brick and then
trigger heal to new brick.
Why should I need the step 8? If the volume is already started and
working (remember that I would like to change disk with no downtime,
thus i can't stop the volume), why should I "start" it again ?
B) let's assume I would like to add a bounch of new bricks on existing
servers. Which is the proper procedure to do so?
Do you mean increase the capacity of the volume by adding new bricks?
You can use gluster v add-brick new-brick(s)
The options provided to add-brick are going to vary based on how you plan to
add these bricks (whether you want to increase replica-count or add a new
replica set etc).
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
i've not found anything similiar in gluster.
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
Hope this helps.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
--
Thanks,
Anuradha.
Gandalf Corvotempesta
2016-08-11 14:16:37 UTC
Permalink
Post by Anuradha Talur
There was a document that I'm not able to locate right now.
The first step after mounting the brick was to set volume ID using
setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brickpath>.
I think there were more steps, I will update once I find the doc.
Once all the required xattrs are set, gluster v start force was supposed to be done.
This is a test environment so I can break everything everytime.
I did (only once) "replace-brick" by using a new mount point, without
setting any fattr and
brick process was brought up immediatly and healing is going on.

Is setfattr mandatory ?
Anuradha Talur
2016-08-11 14:25:06 UTC
Permalink
----- Original Message -----
Sent: Thursday, August 11, 2016 7:46:37 PM
Subject: Re: [Gluster-users] Clarification on common tasks
Post by Anuradha Talur
There was a document that I'm not able to locate right now.
The first step after mounting the brick was to set volume ID using
setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brickpath>.
I think there were more steps, I will update once I find the doc.
Once all the required xattrs are set, gluster v start force was supposed to be done.
This is a test environment so I can break everything everytime.
I did (only once) "replace-brick" by using a new mount point, without
setting any fattr and
brick process was brought up immediatly and healing is going on.
Is setfattr mandatory ?
The replace-brick you did, mentioned in the previous mails was correct and fine.
You said you have different names for old and new brick, so it works.
setfattr is *not* required in this case.

In the above case that you have quoted, I'm talking when brick names have to be the same (if you have that
requirement). There were 2 setfattrs that were needed. Anyway, sorry about the confusion.
As brick names are different, it doesn't affect your testcase.
--
Thanks,
Anuradha.
Gandalf Corvotempesta
2016-08-11 14:28:40 UTC
Permalink
Post by Anuradha Talur
The replace-brick you did, mentioned in the previous mails was correct and fine.
You said you have different names for old and new brick, so it works.
setfattr is *not* required in this case.
In the above case that you have quoted, I'm talking when brick names have to be the same (if you have that
requirement). There were 2 setfattrs that were needed. Anyway, sorry about the confusion.
As brick names are different, it doesn't affect your testcase.
Anyway, the suggested docs page is wrong. It has many bad commands, in example:

"gluster volume heal info failed"

this doesn't exist. The right command should be
"gluster volume <volume> heal info heal-failed

Also, the replace-brick in a replicated volumes, makes use of
"setfattr" and also ask for a directory creation/removal (step 4):

mkdir /mnt/r2/<name-of-nonexistent-dir>
rmdir /mnt/r2/<name-of-nonexistent-dir>
setfattr -n trusted.non-existent-key -v abc /mnt/r2
setfattr -x trusted.non-existent-key /mnt/r2

i didn't this and seems to work fine.

I think that this doc page should be rewrote, there is much confusion
and unneeded steps.
Anuradha Talur
2016-08-11 14:35:38 UTC
Permalink
You are correct. Needs to be changed. Will edit it in the next few days.

----- Original Message -----
Sent: Thursday, August 11, 2016 7:58:40 PM
Subject: Re: [Gluster-users] Clarification on common tasks
Post by Anuradha Talur
The replace-brick you did, mentioned in the previous mails was correct and fine.
You said you have different names for old and new brick, so it works.
setfattr is *not* required in this case.
In the above case that you have quoted, I'm talking when brick names have
to be the same (if you have that
requirement). There were 2 setfattrs that were needed. Anyway, sorry about
the confusion.
As brick names are different, it doesn't affect your testcase.
"gluster volume heal info failed"
this doesn't exist. The right command should be
"gluster volume <volume> heal info heal-failed
Also, the replace-brick in a replicated volumes, makes use of
mkdir /mnt/r2/<name-of-nonexistent-dir>
rmdir /mnt/r2/<name-of-nonexistent-dir>
setfattr -n trusted.non-existent-key -v abc /mnt/r2
setfattr -x trusted.non-existent-key /mnt/r2
i didn't this and seems to work fine.
I think that this doc page should be rewrote, there is much confusion
and unneeded steps.
--
Thanks,
Anuradha.
Continue reading on narkive:
Loading...