Discussion:
Change underlying brick on node
(too old to reply)
David Gossage
2016-08-08 20:39:47 UTC
Permalink
So now that I have my cluster on 3.7.14 and sharded and working I am of
course looking for what to break next.

Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need with
a 3 way replica. I was going to one by one change them to basically raid10
letting it heal in between.

Is best way to do that a systemctl stop glusterd, should I just kill the
brick process to simulate a brick dying, or is their an actual brick
maintenance command?

If /etc/glusterfs is unchanged and /var/lib/glusterd is unchanged will
doing a heal full after reboot or restarting glusterd take care of
everything if I recreate the expected brick path first?

Are the improvements in 3.8 for sharding significant enough I should first
look at updating to 3.8.2 when released in few days?


*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
Lindsay Mathieson
2016-08-08 21:06:28 UTC
Permalink
Post by David Gossage
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need
with a 3 way replica. I was going to one by one change them to
basically raid10 letting it heal in between.
Wouldn't RAID10 be more protection than Raidz6? not that there is
anything wrong with that, all my bricks are on top of a RAIDZ10 pool, as
much for the improved IOPS as the redundancy, though it does ease the
maintance of bricks quite a bit. Have had two drive failures where I
just hotswapped the drive, 0 downtime.

As a matter of curiosity what SSD's are you using for the ZIL and what
size are they?

Do you have compression enabled? lz4?
Post by David Gossage
Is best way to do that a systemctl stop glusterd, should I just kill
the brick process to simulate a brick dying, or is their an actual
brick maintenance command?
There is a gluster replace brick command:

volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> {commit force}

One annoyance is the new brick mount can't be the same as the old one.
If you can I'd setup a test volume and try it out first.
--
Lindsay Mathieson
David Gossage
2016-08-08 21:34:59 UTC
Permalink
On Mon, Aug 8, 2016 at 4:06 PM, Lindsay Mathieson <
Post by David Gossage
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need with
a 3 way replica. I was going to one by one change them to basically raid10
letting it heal in between.
Wouldn't RAID10 be more protection than Raidz6? not that there is anything
wrong with that, all my bricks are on top of a RAIDZ10 pool, as much for
the improved IOPS as the redundancy, though it does ease the maintance of
bricks quite a bit. Have had two drive failures where I just hotswapped the
drive, 0 downtime.
RAID10 you can lose as many drives as mirror pairs set as long as they
aren't in same mirror set. Raidz6/raid6 you can lose any 2 drives and
still stay up regardless of position so it's less crossing fingers if
multiple drives fail back to back. However performance is better for
raid10. So I am basically looking at slightly increasing chance of one
brick/node dropping if I had 2 drives die that happened to be in same
mirror set, in order to squeeze a little more performance out of setup.
As a matter of curiosity what SSD's are you using for the ZIL and what
size are they?
Samsung Pro 850's. small lvm's partitioned to mirror for zil, other 2
larger partitions as l2arc. Im seeing same you are though with poort hit
ratio and may just drop their use.
Do you have compression enabled? lz4?
No, I wasn't that concerned with space usage. WD Red's are fairly cheap
and I have 12-14 drive bays free in the 4U servers used if I want to expand
storage
Post by David Gossage
Is best way to do that a systemctl stop glusterd, should I just kill the
brick process to simulate a brick dying, or is their an actual brick
maintenance command?
volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> {commit force}
One annoyance is the new brick mount can't be the same as the old one. If
you can I'd setup a test volume and try it out first.
That's what I used when replacing the server with a bad nic short while
ago, but wasn't certain if it would just heal whole brick since gluster
config and directories would still consider it part of the volume just with
no data in folder.

My single server dev could likely test it. I'd guess I'd kill brick
process delete that whole brick layout directory to remove all files and
directories. Recreate brick path. restart gluster or server and see what
happens. If heal kicks off or if I need to just give it a new directory
path and do a replace-brick on it.
--
Lindsay Mathieson
Joe Julian
2016-08-08 21:23:20 UTC
Permalink
Post by David Gossage
So now that I have my cluster on 3.7.14 and sharded and working I am
of course looking for what to break next.
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need
with a 3 way replica. I was going to one by one change them to
basically raid10 letting it heal in between.
Is best way to do that a systemctl stop glusterd, should I just kill
the brick process to simulate a brick dying, or is their an actual
brick maintenance command?
Just kill (-15) the brick process. That'll close the TCP connections and
the clients will just go right on functioning off the remaining replica.
When you format and recreate your filesystem, it'll be missing the
volume-id extended attributes so to start it you'll need to force it:

gluster volume start $volname start force
Post by David Gossage
If /etc/glusterfs is unchanged and /var/lib/glusterd is unchanged will
doing a heal full after reboot or restarting glusterd take care of
everything if I recreate the expected brick path first?
Once started, perform a full heal to re-replicate.
Post by David Gossage
Are the improvements in 3.8 for sharding significant enough I should
first look at updating to 3.8.2 when released in few days?
Yes.
Post by David Gossage
*/David Gossage/*/*
*/
//*Carousel Checks Inc.| System Administrator*
*Office*708.613.2284
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David Gossage
2016-08-08 21:37:52 UTC
Permalink
Post by David Gossage
So now that I have my cluster on 3.7.14 and sharded and working I am of
course looking for what to break next.
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need with
a 3 way replica. I was going to one by one change them to basically raid10
letting it heal in between.
Is best way to do that a systemctl stop glusterd, should I just kill the
brick process to simulate a brick dying, or is their an actual brick
maintenance command?
Just kill (-15) the brick process. That'll close the TCP connections and
the clients will just go right on functioning off the remaining replica.
When you format and recreate your filesystem, it'll be missing the
gluster volume start $volname start force
If I left volume started when brick process is killed and clients are still
(in theory) connected to volume wouldn't that just give me an error that
volume is already started?


Likely I would shut down the volume and do downtime for this anyway though
letting heals go on with VM's off.
Post by David Gossage
If /etc/glusterfs is unchanged and /var/lib/glusterd is unchanged will
doing a heal full after reboot or restarting glusterd take care of
everything if I recreate the expected brick path first?
Once started, perform a full heal to re-replicate.
Are the improvements in 3.8 for sharding significant enough I should first
look at updating to 3.8.2 when released in few days?
Yes.
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
_______________________________________________
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David Gossage
2016-08-08 21:56:04 UTC
Permalink
Post by David Gossage
Post by David Gossage
So now that I have my cluster on 3.7.14 and sharded and working I am of
course looking for what to break next.
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need with
a 3 way replica. I was going to one by one change them to basically raid10
letting it heal in between.
Is best way to do that a systemctl stop glusterd, should I just kill the
brick process to simulate a brick dying, or is their an actual brick
maintenance command?
Just kill (-15) the brick process. That'll close the TCP connections and
the clients will just go right on functioning off the remaining replica.
When you format and recreate your filesystem, it'll be missing the
Also could I just do this from different node?

getfattr -n trusted.glusterfs.volume-id /srv/.bricks/www

Then on node with new raid10 backed disks

setfattr -n trusted.glusterfs.volume-id -v 'value_from_other_brick'
/srv/.bricks/www
Post by David Gossage
Post by David Gossage
gluster volume start $volname start force
If I left volume started when brick process is killed and clients are
still (in theory) connected to volume wouldn't that just give me an error
that volume is already started?
Likely I would shut down the volume and do downtime for this anyway though
letting heals go on with VM's off.
Post by David Gossage
If /etc/glusterfs is unchanged and /var/lib/glusterd is unchanged will
doing a heal full after reboot or restarting glusterd take care of
everything if I recreate the expected brick path first?
Once started, perform a full heal to re-replicate.
Are the improvements in 3.8 for sharding significant enough I should
first look at updating to 3.8.2 when released in few days?
Yes.
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
_______________________________________________
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Joe Julian
2016-08-08 22:24:03 UTC
Permalink
On Mon, Aug 8, 2016 at 4:37 PM, David Gossage
Post by David Gossage
So now that I have my cluster on 3.7.14 and sharded and
working I am of course looking for what to break next.
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6
(zil on mirrored ssd), which I am thinking is more protection
than I may need with a 3 way replica. I was going to one by
one change them to basically raid10 letting it heal in between.
Is best way to do that a systemctl stop glusterd, should I
just kill the brick process to simulate a brick dying, or is
their an actual brick maintenance command?
Just kill (-15) the brick process. That'll close the TCP
connections and the clients will just go right on functioning
off the remaining replica. When you format and recreate your
filesystem, it'll be missing the volume-id extended attributes
Also could I just do this from different node?
getfattr -n trusted.glusterfs.volume-id /srv/.bricks/www
Then on node with new raid10 backed disks
setfattr -n trusted.glusterfs.volume-id -v 'value_from_other_brick'
/srv/.bricks/www
Sure, but that's a lot more keystrokes and a lot more potential for
human error.
gluster volume start $volname start force
If I left volume started when brick process is killed and clients
are still (in theory) connected to volume wouldn't that just give
me an error that volume is already started?
Likely I would shut down the volume and do downtime for this
anyway though letting heals go on with VM's off.
Post by David Gossage
If /etc/glusterfs is unchanged and /var/lib/glusterd is
unchanged will doing a heal full after reboot or restarting
glusterd take care of everything if I recreate the expected
brick path first?
Once started, perform a full heal to re-replicate.
Post by David Gossage
Are the improvements in 3.8 for sharding significant enough I
should first look at updating to 3.8.2 when released in few days?
Yes.
Post by David Gossage
*/David Gossage/*/*
*/
//*Carousel Checks Inc.| System Administrator*
*Office*708.613.2284
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________ Gluster-users
http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
David Gossage
2016-08-08 22:58:14 UTC
Permalink
Post by David Gossage
Post by David Gossage
Post by David Gossage
So now that I have my cluster on 3.7.14 and sharded and working I am of
course looking for what to break next.
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need with
a 3 way replica. I was going to one by one change them to basically raid10
letting it heal in between.
Is best way to do that a systemctl stop glusterd, should I just kill the
brick process to simulate a brick dying, or is their an actual brick
maintenance command?
Just kill (-15) the brick process. That'll close the TCP connections and
the clients will just go right on functioning off the remaining replica.
When you format and recreate your filesystem, it'll be missing the
Also could I just do this from different node?
getfattr -n trusted.glusterfs.volume-id /srv/.bricks/www
Then on node with new raid10 backed disks
setfattr -n trusted.glusterfs.volume-id -v 'value_from_other_brick'
/srv/.bricks/www
Sure, but that's a lot more keystrokes and a lot more potential for human
error.
This is true.
Post by David Gossage
Post by David Gossage
Post by David Gossage
gluster volume start $volname start force
If I left volume started when brick process is killed and clients are
still (in theory) connected to volume wouldn't that just give me an error
that volume is already started?
Likely I would shut down the volume and do downtime for this anyway
though letting heals go on with VM's off.
Post by David Gossage
If /etc/glusterfs is unchanged and /var/lib/glusterd is unchanged will
doing a heal full after reboot or restarting glusterd take care of
everything if I recreate the expected brick path first?
Once started, perform a full heal to re-replicate.
Are the improvements in 3.8 for sharding significant enough I should
first look at updating to 3.8.2 when released in few days?
Yes.
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
_______________________________________________
_______________________________________________ Gluster-users mailing
/listinfo/gluster-users
David Gossage
2016-08-09 14:34:05 UTC
Permalink
Post by David Gossage
Post by David Gossage
Post by David Gossage
So now that I have my cluster on 3.7.14 and sharded and working I am of
course looking for what to break next.
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil on
mirrored ssd), which I am thinking is more protection than I may need with
a 3 way replica. I was going to one by one change them to basically raid10
letting it heal in between.
Is best way to do that a systemctl stop glusterd, should I just kill the
brick process to simulate a brick dying, or is their an actual brick
maintenance command?
Just kill (-15) the brick process. That'll close the TCP connections and
the clients will just go right on functioning off the remaining replica.
When you format and recreate your filesystem, it'll be missing the
Also could I just do this from different node?
getfattr -n trusted.glusterfs.volume-id /srv/.bricks/www
Then on node with new raid10 backed disks
setfattr -n trusted.glusterfs.volume-id -v 'value_from_other_brick'
/srv/.bricks/www
Sure, but that's a lot more keystrokes and a lot more potential for human
error.
So far working on test machine. my one vm still plodding away no packet
loss
Post by David Gossage
Post by David Gossage
Post by David Gossage
gluster volume start $volname start force
If I left volume started when brick process is killed and clients are
still (in theory) connected to volume wouldn't that just give me an error
that volume is already started?
Likely I would shut down the volume and do downtime for this anyway
though letting heals go on with VM's off.
Post by David Gossage
If /etc/glusterfs is unchanged and /var/lib/glusterd is unchanged will
doing a heal full after reboot or restarting glusterd take care of
everything if I recreate the expected brick path first?
Once started, perform a full heal to re-replicate.
Are the improvements in 3.8 for sharding significant enough I should
first look at updating to 3.8.2 when released in few days?
Yes.
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
_______________________________________________
_______________________________________________ Gluster-users mailing
/listinfo/gluster-users
Joe Julian
2016-08-08 22:23:25 UTC
Permalink
Post by Joe Julian
Post by David Gossage
So now that I have my cluster on 3.7.14 and sharded and working I
am of course looking for what to break next.
Currently each of 3 nodes is on a 6 disk (WD Red 1TB) raidz6 (zil
on mirrored ssd), which I am thinking is more protection than I
may need with a 3 way replica. I was going to one by one change
them to basically raid10 letting it heal in between.
Is best way to do that a systemctl stop glusterd, should I just
kill the brick process to simulate a brick dying, or is their an
actual brick maintenance command?
Just kill (-15) the brick process. That'll close the TCP
connections and the clients will just go right on functioning off
the remaining replica. When you format and recreate your
filesystem, it'll be missing the volume-id extended attributes so
gluster volume start $volname start force
If I left volume started when brick process is killed and clients are
still (in theory) connected to volume wouldn't that just give me an
error that volume is already started?
No, it will just force-start the missing brick.
Post by Joe Julian
Likely I would shut down the volume and do downtime for this anyway
though letting heals go on with VM's off.
Post by David Gossage
If /etc/glusterfs is unchanged and /var/lib/glusterd is unchanged
will doing a heal full after reboot or restarting glusterd take
care of everything if I recreate the expected brick path first?
Once started, perform a full heal to re-replicate.
Post by David Gossage
Are the improvements in 3.8 for sharding significant enough I
should first look at updating to 3.8.2 when released in few days?
Yes.
Post by David Gossage
*/David Gossage/*/*
*/
//*Carousel Checks Inc.| System Administrator*
*Office*708.613.2284
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________ Gluster-users
http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
Lindsay Mathieson
2016-08-09 02:15:59 UTC
Permalink
Just kill (-15) the brick process. That'll close the TCP connections and the
clients will just go right on functioning off the remaining replica. When
you format and recreate your filesystem, it'll be missing the volume-id
gluster volume start $volname start force
Just to clarify I'm interpreting this correctly, to replace a brick
and preserve its mount point you can:

1. kill the brick process (glusterfsd)

2. Do your disk maintenance. Eventually you have a clean (erased) brick mount

3. Force the bricks process start. This will recreate all the meta
data and start a full heal that will replicate all the data from the
other bricks.

Looks like the easiest way to replace a brick to me :)



thanks,
--
Lindsay
David Gossage
2016-08-09 02:23:45 UTC
Permalink
On Mon, Aug 8, 2016 at 9:15 PM, Lindsay Mathieson <
Post by Lindsay Mathieson
Post by Joe Julian
Just kill (-15) the brick process. That'll close the TCP connections and
the
Post by Joe Julian
clients will just go right on functioning off the remaining replica. When
you format and recreate your filesystem, it'll be missing the volume-id
gluster volume start $volname start force
Just to clarify I'm interpreting this correctly, to replace a brick
1. kill the brick process (glusterfsd)
2. Do your disk maintenance. Eventually you have a clean (erased) brick mount
3. Force the bricks process start. This will recreate all the meta
data and start a full heal that will replicate all the data from the
other bricks.
Looks like the easiest way to replace a brick to me :)
Since my dev is now on 3.8 and has granular enabled I'm feeling too lazy
to roll back so will just wait till 3.8.2 is released in few days that
fixes the bugs mentioned to me and then test this few times on my dev.

Would be nice if I could get to a point where I could have one brick dead
and doing a full heal and not have every VM pause while shards heal, but I
may be asking too much when dealing with a rather heavy recovery.
Post by Lindsay Mathieson
thanks,
--
Lindsay
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Lindsay Mathieson
2016-08-09 07:18:22 UTC
Permalink
Since my dev is now on 3.8 and has granular enabled I'm feeling too lazy to
roll back so will just wait till 3.8.2 is released in few days that fixes
the bugs mentioned to me and then test this few times on my dev.
Would be nice if I could get to a point where I could have one brick dead
and doing a full heal and not have every VM pause while shards heal, but I
may be asking too much when dealing with a rather heavy recovery.
I tested Joe's method with my dev cluster, worked perfectly. The *one*
running VM on it didn't seem to badly affected, but it was already
very slow :)

Thanks Joe.
--
Lindsay
David Gossage
2016-08-09 08:17:33 UTC
Permalink
On Tue, Aug 9, 2016 at 2:18 AM, Lindsay Mathieson <
Post by Lindsay Mathieson
Post by Lindsay Mathieson
Since my dev is now on 3.8 and has granular enabled I'm feeling too lazy
to
Post by Lindsay Mathieson
roll back so will just wait till 3.8.2 is released in few days that fixes
the bugs mentioned to me and then test this few times on my dev.
Would be nice if I could get to a point where I could have one brick dead
and doing a full heal and not have every VM pause while shards heal, but
I
Post by Lindsay Mathieson
may be asking too much when dealing with a rather heavy recovery.
I tested Joe's method with my dev cluster, worked perfectly. The *one*
running VM on it didn't seem to badly affected, but it was already
very slow :)
3.7 or 3.8? And if 3.8 did you have that new granular-entry-heal feature
enabled?
Post by Lindsay Mathieson
Thanks Joe.
--
Lindsay
Continue reading on narkive:
Loading...