Discussion:
VM disks corruption on 3.7.11
(too old to reply)
Kevin Lemonnier
2016-05-12 14:03:02 UTC
Permalink
Hi,

I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps working
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.

Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
VMs with qCow2 disks stored on the gluster volume.
Here is the config :

Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
Options Reconfigured:
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on


As mentioned, I rebooted one of the nodes to test the freezing issue I had
on previous versions and appart from the initial timeout, nothing, the website
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried to refresh
the database by importing the production one on the two MySQL VMs, and both of them
started doing I/O errors. I tried shutting them down and powering them on again,
but same thing, even starting full heals by hand doesn't solve the problem, the disks are
corrupted. They still work, but sometimes they remount their partitions read only ..

I believe there is a few people already using 3.7.11, no one noticed corruption problems ?
Anyone using Proxmox ? As already mentionned in multiple other threads on this mailing list
by other users, I also have pretty much always shards in heal info, but nothing "stuck" there,
they always go away in a few seconds getting replaced by other shards.

Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Kevin Lemonnier
2016-05-12 14:24:14 UTC
Permalink
As requested on IRC, here are the logs on the 3 nodes.
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps working
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I had
on previous versions and appart from the initial timeout, nothing, the website
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried to refresh
the database by importing the production one on the two MySQL VMs, and both of them
started doing I/O errors. I tried shutting them down and powering them on again,
but same thing, even starting full heals by hand doesn't solve the problem, the disks are
corrupted. They still work, but sometimes they remount their partitions read only ..
I believe there is a few people already using 3.7.11, no one noticed corruption problems ?
Anyone using Proxmox ? As already mentionned in multiple other threads on this mailing list
by other users, I also have pretty much always shards in heal info, but nothing "stuck" there,
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Kevin Lemonnier
2016-05-12 15:19:50 UTC
Permalink
As discussed, the missing ipvr50 log file.
Post by Kevin Lemonnier
As requested on IRC, here are the logs on the 3 nodes.
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps working
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I had
on previous versions and appart from the initial timeout, nothing, the website
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried to refresh
the database by importing the production one on the two MySQL VMs, and both of them
started doing I/O errors. I tried shutting them down and powering them on again,
but same thing, even starting full heals by hand doesn't solve the problem, the disks are
corrupted. They still work, but sometimes they remount their partitions read only ..
I believe there is a few people already using 3.7.11, no one noticed corruption problems ?
Anyone using Proxmox ? As already mentionned in multiple other threads on this mailing list
by other users, I also have pretty much always shards in heal info, but nothing "stuck" there,
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Lindsay Mathieson
2016-05-12 14:56:49 UTC
Permalink
Post by Kevin Lemonnier
I just tried to refresh
the database by importing the production one on the two MySQL VMs, and both of them
started doing I/O errors.
Sorry, I don't quite undertsand what you did - you migrated 1 or 2 VM's
onto the test gluster volume?
--
Lindsay Mathieson
Kevin Lemonnier
2016-05-18 13:15:15 UTC
Permalink
Hi,

Some news on this.
Over the week end the RAID Card of the node ipvr2 died, and I thought
that maybe that was the problem all along. The RAID Card was changed
and yesterday I reinstalled everything.
Same problem just now.

My test is simple, using the website hosted on the VMs all the time
I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
ipvr2 then reboot it, wait for the heal to complete then migrate all
the VMs off ipvr3 then reboot it.
Everytime the first database VM (which is the only one really using the disk
durign the heal) starts showing I/O errors on it's disk.

Am I really the only one with that problem ?
Maybe one of the drives is dying too, who knows, but SMART isn't saying anything ..
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps working
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I had
on previous versions and appart from the initial timeout, nothing, the website
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried to refresh
the database by importing the production one on the two MySQL VMs, and both of them
started doing I/O errors. I tried shutting them down and powering them on again,
but same thing, even starting full heals by hand doesn't solve the problem, the disks are
corrupted. They still work, but sometimes they remount their partitions read only ..
I believe there is a few people already using 3.7.11, no one noticed corruption problems ?
Anyone using Proxmox ? As already mentionned in multiple other threads on this mailing list
by other users, I also have pretty much always shards in heal info, but nothing "stuck" there,
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Krutika Dhananjay
2016-05-18 13:41:08 UTC
Permalink
Hi,

I will try to recreate this issue tomorrow on my machines with the steps
that Lindsay provided in this thread. I will let you know the result soon
after that.

-Krutika
Post by Kevin Lemonnier
Hi,
Some news on this.
Over the week end the RAID Card of the node ipvr2 died, and I thought
that maybe that was the problem all along. The RAID Card was changed
and yesterday I reinstalled everything.
Same problem just now.
My test is simple, using the website hosted on the VMs all the time
I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
ipvr2 then reboot it, wait for the heal to complete then migrate all
the VMs off ipvr3 then reboot it.
Everytime the first database VM (which is the only one really using the disk
durign the heal) starts showing I/O errors on it's disk.
Am I really the only one with that problem ?
Maybe one of the drives is dying too, who knows, but SMART isn't saying anything ..
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps working
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I had
on previous versions and appart from the initial timeout, nothing, the website
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried to refresh
the database by importing the production one on the two MySQL VMs, and both of them
started doing I/O errors. I tried shutting them down and powering them on again,
but same thing, even starting full heals by hand doesn't solve the problem, the disks are
corrupted. They still work, but sometimes they remount their partitions read only ..
I believe there is a few people already using 3.7.11, no one noticed corruption problems ?
Anyone using Proxmox ? As already mentionned in multiple other threads on this mailing list
by other users, I also have pretty much always shards in heal info, but
nothing "stuck" there,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Kevin Lemonnier
2016-05-18 14:06:44 UTC
Permalink
Some additional details if it helps, there is no cache on the disk,
it's virtio and iothread=1. The file is in qcow and using qemu-img check
it says it's not corrupted, but when the VM is running I have I/O Errors.
As you can see in the config, performance.stat-prefetch: off but being
on a debian system I don't have the virt group, I just tried to replicate
the different settings by hand. Maybe I forgot something.

Thanks !
Post by Krutika Dhananjay
Hi,
I will try to recreate this issue tomorrow on my machines with the steps
that Lindsay provided in this thread. I will let you know the result soon
after that.
-Krutika
Post by Kevin Lemonnier
Hi,
Some news on this.
Over the week end the RAID Card of the node ipvr2 died, and I thought
that maybe that was the problem all along. The RAID Card was changed
and yesterday I reinstalled everything.
Same problem just now.
My test is simple, using the website hosted on the VMs all the time
I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
ipvr2 then reboot it, wait for the heal to complete then migrate all
the VMs off ipvr3 then reboot it.
Everytime the first database VM (which is the only one really using the
disk
Post by Kevin Lemonnier
durign the heal) starts showing I/O errors on it's disk.
Am I really the only one with that problem ?
Maybe one of the drives is dying too, who knows, but SMART isn't saying
anything ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps working
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I
had
Post by Kevin Lemonnier
Post by Kevin Lemonnier
on previous versions and appart from the initial timeout, nothing, the
website
Post by Kevin Lemonnier
Post by Kevin Lemonnier
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried
to refresh
Post by Kevin Lemonnier
Post by Kevin Lemonnier
the database by importing the production one on the two MySQL VMs, and
both of them
Post by Kevin Lemonnier
Post by Kevin Lemonnier
started doing I/O errors. I tried shutting them down and powering them
on again,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
but same thing, even starting full heals by hand doesn't solve the
problem, the disks are
Post by Kevin Lemonnier
Post by Kevin Lemonnier
corrupted. They still work, but sometimes they remount their partitions
read only ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
I believe there is a few people already using 3.7.11, no one noticed
corruption problems ?
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Anyone using Proxmox ? As already mentionned in multiple other threads
on this mailing list
Post by Kevin Lemonnier
Post by Kevin Lemonnier
by other users, I also have pretty much always shards in heal info, but
nothing "stuck" there,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Lindsay Mathieson
2016-05-18 14:17:19 UTC
Permalink
Post by Krutika Dhananjay
I will try to recreate this issue tomorrow on my machines with the
steps that Lindsay provided in this thread. I will let you know the
result soon after that.
Thanks Krutika, I've been trying to get the shard stats you wanted, but
by the time the heal info completed, the shard in question have been
healed ... The node in question is the last node on the list :)


I'll swap them around and try tomorrow.


One thought - since the VM's are active while the brick is
removed/re-added, could it be the shards that are written while the
brick is added that are the reverse healing shards?
--
Lindsay Mathieson
Lindsay Mathieson
2016-05-19 13:25:34 UTC
Permalink
Post by Lindsay Mathieson
One thought - since the VM's are active while the brick is
removed/re-added, could it be the shards that are written while the
brick is added that are the reverse healing shards?
I tested by:

- removing brick 3

- erasing brick 3

- closing down all VM's

- adding new brick 3

- waiting until heal number reached its max and started decreasing

There were no reverse heals

- Started the VM's backup. No real issues there though one showed IO
errors, presumably due to shards being locked as they were healed.

- VM's started ok, no reverse heals were noted and eventually Brick 3
was fully healed. The VM's do not appear to be corrupted.


So it would appear the problem is adding a brick while the volume is
being written to.


Cheers,
--
Lindsay Mathieson
Kevin Lemonnier
2016-05-19 13:58:52 UTC
Permalink
That's a different problem then, I have corruption without removing or adding bricks,
as mentionned. Might be two separate issue
Post by Lindsay Mathieson
One thought - since the VM's are active while the brick is
removed/re-added, could it be the shards that are written while the
brick is added that are the reverse healing shards?
- removing brick 3
- erasing brick 3
- closing down all VM's
- adding new brick 3
- waiting until heal number reached its max and started decreasing
There were no reverse heals
- Started the VM's backup. No real issues there though one showed IO
errors, presumably due to shards being locked as they were healed.
- VM's started ok, no reverse heals were noted and eventually Brick 3 was
fully healed. The VM's do not appear to be corrupted.
So it would appear the problem is adding a brick while the volume is being
written to.
Cheers,
--
Lindsay Mathieson
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Alastair Neil
2016-05-19 20:04:49 UTC
Permalink
I am slightly confused you say you have image file corruption but then you
say the qemu-img check says there is no corruption. If what you mean is
that you see I/O errors during a heal this is likely to be due to io
starvation, something that is a well know issue.

There is work happening to improve this in version 3.8:

https://bugzilla.redhat.com/show_bug.cgi?id=1269461
Post by Kevin Lemonnier
That's a different problem then, I have corruption without removing or adding bricks,
as mentionned. Might be two separate issue
Post by Lindsay Mathieson
One thought - since the VM's are active while the brick is
removed/re-added, could it be the shards that are written while the
brick is added that are the reverse healing shards?
- removing brick 3
- erasing brick 3
- closing down all VM's
- adding new brick 3
- waiting until heal number reached its max and started decreasing
There were no reverse heals
- Started the VM's backup. No real issues there though one showed IO
errors, presumably due to shards being locked as they were healed.
- VM's started ok, no reverse heals were noted and eventually Brick 3
was
Post by Lindsay Mathieson
fully healed. The VM's do not appear to be corrupted.
So it would appear the problem is adding a brick while the volume is
being
Post by Lindsay Mathieson
written to.
Cheers,
--
Lindsay Mathieson
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Kevin Lemonnier
2016-05-20 00:25:53 UTC
Permalink
The I/O errors are happening after, not during the heal.
As described, I just rebooted a node, waited for the heal to finish,
rebooted another, waited for the heal to finish then rebooted the third.
From that point, the VM just has a lot of I/O errors showing whenever I
use the disk a lot (importing big MySQL dumps). The VM "screen" on the console
tab of proxmox just spams I/O errors from that point, which it didn't before rebooting
the gluster nodes. Tried to poweroff the VM and force full heals, but I didn't find
a way to fix the problem short of deleting the VM disk and restoring it from a backup.

I have 3 other servers on 3.7.6 where that problem isn't happening, so it might be a 3.7.11 bug,
but since the raid card failed recently on one of the nodes I'm not really sure some other
piece of hardware isn't at fault .. Unfortunatly I don't have the hardware to test that.
The only way to be sure would be to upgrade the 3.7.6 nodes to 3.7.11 and repeat the same tests,
but those nodes are in production and the VM freezes during the heal last month already
caused huge problems for our clients, really can't afford any other problems there,
so testing on them isn't an option.

To sum up, I have 3 nodes on 3.7.6 with no corruption happening but huge freezes during heals,
and 3 other nodes on 3.7.11 with no freezes during heal but corruption. qemu-img doesn't see the
corruption, it only shows on the VM's screen and seems mostly harmless, but sometimes the VM
does switch to read-only mode saying it had too many I/O errors.

Would the bitrot detection deamon detect a hardware problem ? I did enable it but it didn't
detect anything, although I don't know how to force a check on it, no idea if it ran a scrub
since the corruption happened.
Post by Alastair Neil
I am slightly confused you say you have image file corruption but then you
say the qemu-img check says there is no corruption.A If what you mean is
that you see I/O errors during a heal this is likely to be due to io
starvation, something that is a well know issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1269461
That's a different problem then, I have corruption without removing or
adding bricks,
as mentionned. Might be two separate issue
A A A One thought - since the VM's are active while the brick is
A A A removed/re-added, could it be the shards that are written
while the
A A A brick is added that are the reverse healing shards?
A A - removing brick 3
A A - erasing brick 3
A A - closing down all VM's
A A - adding new brick 3
A A - waiting until heal number reached its max and started
decreasing
A A A There were no reverse heals
A A - Started the VM's backup. No real issues there though one showed
IO
A A errors, presumably due to shards being locked as they were
healed.
A A - VM's started ok, no reverse heals were noted and eventually
Brick 3 was
A A fully healed. The VM's do not appear to be corrupted.
A A So it would appear the problem is adding a brick while the volume
is being
A A written to.
A A Cheers,
A --
A Lindsay Mathieson
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
David Gossage
2016-05-20 00:54:33 UTC
Permalink
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
Post by Kevin Lemonnier
The I/O errors are happening after, not during the heal.
As described, I just rebooted a node, waited for the heal to finish,
rebooted another, waited for the heal to finish then rebooted the third.
From that point, the VM just has a lot of I/O errors showing whenever I
use the disk a lot (importing big MySQL dumps). The VM "screen" on the console
tab of proxmox just spams I/O errors from that point, which it didn't before rebooting
the gluster nodes. Tried to poweroff the VM and force full heals, but I didn't find
a way to fix the problem short of deleting the VM disk and restoring it from a backup.
I have 3 other servers on 3.7.6 where that problem isn't happening, so it
might be a 3.7.11 bug,
but since the raid card failed recently on one of the nodes I'm not really sure some other
piece of hardware isn't at fault .. Unfortunatly I don't have the hardware to test that.
The only way to be sure would be to upgrade the 3.7.6 nodes to 3.7.11 and
repeat the same tests,
but those nodes are in production and the VM freezes during the heal last month already
caused huge problems for our clients, really can't afford any other problems there,
so testing on them isn't an option.
Are the 3.7.11 nodes in production? Could they be downgraded to 3.7.6 and
see if problem still occurs?
Post by Kevin Lemonnier
To sum up, I have 3 nodes on 3.7.6 with no corruption happening but huge
freezes during heals,
and 3 other nodes on 3.7.11 with no freezes during heal but corruption.
qemu-img doesn't see the
corruption, it only shows on the VM's screen and seems mostly harmless,
but sometimes the VM
does switch to read-only mode saying it had too many I/O errors.
Would the bitrot detection deamon detect a hardware problem ? I did enable it but it didn't
detect anything, although I don't know how to force a check on it, no idea
if it ran a scrub
since the corruption happened.
Post by Alastair Neil
I am slightly confused you say you have image file corruption but
then you
Post by Alastair Neil
say the qemu-img check says there is no corruption.A If what you
mean is
Post by Alastair Neil
that you see I/O errors during a heal this is likely to be due to io
starvation, something that is a well know issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1269461
That's a different problem then, I have corruption without removing
or
Post by Alastair Neil
adding bricks,
as mentionned. Might be two separate issue
A A A One thought - since the VM's are active while the brick is
A A A removed/re-added, could it be the shards that are written
while the
A A A brick is added that are the reverse healing shards?
A A - removing brick 3
A A - erasing brick 3
A A - closing down all VM's
A A - adding new brick 3
A A - waiting until heal number reached its max and started
decreasing
A A A There were no reverse heals
A A - Started the VM's backup. No real issues there though one
showed
Post by Alastair Neil
IO
A A errors, presumably due to shards being locked as they were
healed.
A A - VM's started ok, no reverse heals were noted and eventually
Brick 3 was
A A fully healed. The VM's do not appear to be corrupted.
A A So it would appear the problem is adding a brick while the
volume
Post by Alastair Neil
is being
A A written to.
A A Cheers,
A --
A Lindsay Mathieson
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Kevin Lemonnier
2016-05-23 11:54:30 UTC
Permalink
Hi,

I didn't specify it but I use "localhost" to add the storage in proxmox.
My thinking is that every proxmox node is also a glusterFS node, so that
should work fine.

I don't want to use the "normal" way of setting a regular address in there
because you can't change it afterwards in proxmox, but could that be the source of
the problem, maybe during livre migration there is write comming from
two different servers at the same time ?
Post by Krutika Dhananjay
Hi,
I will try to recreate this issue tomorrow on my machines with the steps
that Lindsay provided in this thread. I will let you know the result soon
after that.
-Krutika
Post by Kevin Lemonnier
Hi,
Some news on this.
Over the week end the RAID Card of the node ipvr2 died, and I thought
that maybe that was the problem all along. The RAID Card was changed
and yesterday I reinstalled everything.
Same problem just now.
My test is simple, using the website hosted on the VMs all the time
I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
ipvr2 then reboot it, wait for the heal to complete then migrate all
the VMs off ipvr3 then reboot it.
Everytime the first database VM (which is the only one really using the
disk
Post by Kevin Lemonnier
durign the heal) starts showing I/O errors on it's disk.
Am I really the only one with that problem ?
Maybe one of the drives is dying too, who knows, but SMART isn't saying
anything ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps
working
Post by Kevin Lemonnier
Post by Kevin Lemonnier
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are
proxmox
Post by Kevin Lemonnier
Post by Kevin Lemonnier
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I
had
Post by Kevin Lemonnier
Post by Kevin Lemonnier
on previous versions and appart from the initial timeout, nothing, the
website
Post by Kevin Lemonnier
Post by Kevin Lemonnier
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried
to refresh
Post by Kevin Lemonnier
Post by Kevin Lemonnier
the database by importing the production one on the two MySQL VMs, and
both of them
Post by Kevin Lemonnier
Post by Kevin Lemonnier
started doing I/O errors. I tried shutting them down and powering them
on again,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
but same thing, even starting full heals by hand doesn't solve the
problem, the disks are
Post by Kevin Lemonnier
Post by Kevin Lemonnier
corrupted. They still work, but sometimes they remount their partitions
read only ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
I believe there is a few people already using 3.7.11, no one noticed
corruption problems ?
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Anyone using Proxmox ? As already mentionned in multiple other threads
on this mailing list
Post by Kevin Lemonnier
Post by Kevin Lemonnier
by other users, I also have pretty much always shards in heal info, but
nothing "stuck" there,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Kevin Lemonnier
2016-05-24 09:33:28 UTC
Permalink
Hi,

Some news on this.
I actually don't need to trigger a heal to get corruption, so the problem
is not the healing. Live migrating the VM seems to trigger corruption every
time, and even without that just doing a database import, rebooting then
doing another import seems to corrupt as well.

To check I created local storages on each node on the same partition as the
gluster bricks, on XFS, and moved the VM disk on each local storage and tested
the same procedure one by one, no corruption. It seems to happen only on
glusterFS, so I'm not so sure it's hardware anymore : if it was using local storage
would corrupt too, right ?
Could I be missing some critical configuration for VM storage on my gluster volume ?
Post by Kevin Lemonnier
Hi,
I didn't specify it but I use "localhost" to add the storage in proxmox.
My thinking is that every proxmox node is also a glusterFS node, so that
should work fine.
I don't want to use the "normal" way of setting a regular address in there
because you can't change it afterwards in proxmox, but could that be the source of
the problem, maybe during livre migration there is write comming from
two different servers at the same time ?
Post by Krutika Dhananjay
Hi,
I will try to recreate this issue tomorrow on my machines with the steps
that Lindsay provided in this thread. I will let you know the result soon
after that.
-Krutika
Post by Kevin Lemonnier
Hi,
Some news on this.
Over the week end the RAID Card of the node ipvr2 died, and I thought
that maybe that was the problem all along. The RAID Card was changed
and yesterday I reinstalled everything.
Same problem just now.
My test is simple, using the website hosted on the VMs all the time
I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
ipvr2 then reboot it, wait for the heal to complete then migrate all
the VMs off ipvr3 then reboot it.
Everytime the first database VM (which is the only one really using the
disk
Post by Kevin Lemonnier
durign the heal) starts showing I/O errors on it's disk.
Am I really the only one with that problem ?
Maybe one of the drives is dying too, who knows, but SMART isn't saying
anything ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps
working
Post by Kevin Lemonnier
Post by Kevin Lemonnier
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are
proxmox
Post by Kevin Lemonnier
Post by Kevin Lemonnier
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I
had
Post by Kevin Lemonnier
Post by Kevin Lemonnier
on previous versions and appart from the initial timeout, nothing, the
website
Post by Kevin Lemonnier
Post by Kevin Lemonnier
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried
to refresh
Post by Kevin Lemonnier
Post by Kevin Lemonnier
the database by importing the production one on the two MySQL VMs, and
both of them
Post by Kevin Lemonnier
Post by Kevin Lemonnier
started doing I/O errors. I tried shutting them down and powering them
on again,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
but same thing, even starting full heals by hand doesn't solve the
problem, the disks are
Post by Kevin Lemonnier
Post by Kevin Lemonnier
corrupted. They still work, but sometimes they remount their partitions
read only ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
I believe there is a few people already using 3.7.11, no one noticed
corruption problems ?
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Anyone using Proxmox ? As already mentionned in multiple other threads
on this mailing list
Post by Kevin Lemonnier
Post by Kevin Lemonnier
by other users, I also have pretty much always shards in heal info, but
nothing "stuck" there,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Kevin Lemonnier
2016-05-24 10:24:44 UTC
Permalink
So the VM were configured with cache set to none, I just tried with
cache=directsync and it seems to be fixing the issue. Still need to run
more test, but did a couple already with that option and no I/O errors.

Never had to do this before, is it known ? Found the clue in some old mail
from this mailing list, did I miss some doc saying you should be using
directsync with glusterfs ?
Post by Kevin Lemonnier
Hi,
Some news on this.
I actually don't need to trigger a heal to get corruption, so the problem
is not the healing. Live migrating the VM seems to trigger corruption every
time, and even without that just doing a database import, rebooting then
doing another import seems to corrupt as well.
To check I created local storages on each node on the same partition as the
gluster bricks, on XFS, and moved the VM disk on each local storage and tested
the same procedure one by one, no corruption. It seems to happen only on
glusterFS, so I'm not so sure it's hardware anymore : if it was using local storage
would corrupt too, right ?
Could I be missing some critical configuration for VM storage on my gluster volume ?
Post by Kevin Lemonnier
Hi,
I didn't specify it but I use "localhost" to add the storage in proxmox.
My thinking is that every proxmox node is also a glusterFS node, so that
should work fine.
I don't want to use the "normal" way of setting a regular address in there
because you can't change it afterwards in proxmox, but could that be the source of
the problem, maybe during livre migration there is write comming from
two different servers at the same time ?
Post by Krutika Dhananjay
Hi,
I will try to recreate this issue tomorrow on my machines with the steps
that Lindsay provided in this thread. I will let you know the result soon
after that.
-Krutika
Post by Kevin Lemonnier
Hi,
Some news on this.
Over the week end the RAID Card of the node ipvr2 died, and I thought
that maybe that was the problem all along. The RAID Card was changed
and yesterday I reinstalled everything.
Same problem just now.
My test is simple, using the website hosted on the VMs all the time
I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
ipvr2 then reboot it, wait for the heal to complete then migrate all
the VMs off ipvr3 then reboot it.
Everytime the first database VM (which is the only one really using the
disk
Post by Kevin Lemonnier
durign the heal) starts showing I/O errors on it's disk.
Am I really the only one with that problem ?
Maybe one of the drives is dying too, who knows, but SMART isn't saying
anything ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps
working
Post by Kevin Lemonnier
Post by Kevin Lemonnier
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are
proxmox
Post by Kevin Lemonnier
Post by Kevin Lemonnier
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I
had
Post by Kevin Lemonnier
Post by Kevin Lemonnier
on previous versions and appart from the initial timeout, nothing, the
website
Post by Kevin Lemonnier
Post by Kevin Lemonnier
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried
to refresh
Post by Kevin Lemonnier
Post by Kevin Lemonnier
the database by importing the production one on the two MySQL VMs, and
both of them
Post by Kevin Lemonnier
Post by Kevin Lemonnier
started doing I/O errors. I tried shutting them down and powering them
on again,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
but same thing, even starting full heals by hand doesn't solve the
problem, the disks are
Post by Kevin Lemonnier
Post by Kevin Lemonnier
corrupted. They still work, but sometimes they remount their partitions
read only ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
I believe there is a few people already using 3.7.11, no one noticed
corruption problems ?
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Anyone using Proxmox ? As already mentionned in multiple other threads
on this mailing list
Post by Kevin Lemonnier
Post by Kevin Lemonnier
by other users, I also have pretty much always shards in heal info, but
nothing "stuck" there,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Lindsay Mathieson
2016-05-24 10:54:24 UTC
Permalink
Post by Kevin Lemonnier
So the VM were configured with cache set to none, I just tried with
cache=directsync and it seems to be fixing the issue. Still need to run
more test, but did a couple already with that option and no I/O errors.
Never had to do this before, is it known ? Found the clue in some old mail
from this mailing list, did I miss some doc saying you should be using
directsync with glusterfs ?
Interesting, I remember seeing some issues with cache=none on the
proxmox mailing list. I use writeback or default, which might be why I
haven't encountered theses issue. I suspect you would find writethrough
works as well.


From the proxmox wiki:


"/This mode causes qemu-kvm to interact with the disk image file or
block device with O_DIRECT semantics, so the host page cache is bypassed //
// and I/O happens directly between the qemu-kvm userspace buffers
and the storage device. Because the actual storage device may
report //
// a write as completed when placed in its write queue only, the
guest's virtual storage adapter is informed that there is a writeback
cache, //
// so the guest would be expected to send down flush commands as
needed to manage data integrity.//
// Equivalent to direct access to your hosts' disk, performance wise./"


I'll restore a test vm and try cache=none myself.
--
Lindsay Mathieson
Nicolas Ecarnot
2016-05-24 11:12:13 UTC
Permalink
Post by Lindsay Mathieson
Post by Kevin Lemonnier
So the VM were configured with cache set to none, I just tried with
cache=directsync and it seems to be fixing the issue. Still need to run
more test, but did a couple already with that option and no I/O errors.
Never had to do this before, is it known ? Found the clue in some old mail
from this mailing list, did I miss some doc saying you should be using
directsync with glusterfs ?
Interesting, I remember seeing some issues with cache=none on the
proxmox mailing list. I use writeback or default, which might be why I
haven't encountered theses issue. I suspect you would find writethrough
works as well.
"/This mode causes qemu-kvm to interact with the disk image file or
block device with O_DIRECT semantics, so the host page cache is bypassed //
// and I/O happens directly between the qemu-kvm userspace buffers
and the storage device. Because the actual storage device may
report //
// a write as completed when placed in its write queue only, the
guest's virtual storage adapter is informed that there is a writeback
cache, //
// so the guest would be expected to send down flush commands as
needed to manage data integrity.//
// Equivalent to direct access to your hosts' disk, performance wise./"
I'll restore a test vm and try cache=none myself.
Hi,

Is there any risk this could also apply to oVirt VMs stored on glusterFS?
I see no place I could specify this cache setting in an oVirt+gluster setup.
--
Nicolas ECARNOT
Kevin Lemonnier
2016-05-25 07:36:03 UTC
Permalink
Nope, not solved !
Looks like directsync just delays the problem, this morning the VM had
thrown a bunch of I/O errors again. Tried writethrough and it seems to
behave exactly like cache=none, the errors appear in a few minutes.
Trying again with directsync and no errors for now, so it looks like
directsync is better than nothing, but still doesn't solve the problem.

Really can't use this in production, the VM goes read only after a few
days because it saw too much I/O errors. Must be missing something
Post by Kevin Lemonnier
So the VM were configured with cache set to none, I just tried with
cache=directsync and it seems to be fixing the issue. Still need to run
more test, but did a couple already with that option and no I/O errors.
Never had to do this before, is it known ? Found the clue in some old mail
from this mailing list, did I miss some doc saying you should be using
directsync with glusterfs ?
Post by Kevin Lemonnier
Hi,
Some news on this.
I actually don't need to trigger a heal to get corruption, so the problem
is not the healing. Live migrating the VM seems to trigger corruption every
time, and even without that just doing a database import, rebooting then
doing another import seems to corrupt as well.
To check I created local storages on each node on the same partition as the
gluster bricks, on XFS, and moved the VM disk on each local storage and tested
the same procedure one by one, no corruption. It seems to happen only on
glusterFS, so I'm not so sure it's hardware anymore : if it was using local storage
would corrupt too, right ?
Could I be missing some critical configuration for VM storage on my gluster volume ?
Post by Kevin Lemonnier
Hi,
I didn't specify it but I use "localhost" to add the storage in proxmox.
My thinking is that every proxmox node is also a glusterFS node, so that
should work fine.
I don't want to use the "normal" way of setting a regular address in there
because you can't change it afterwards in proxmox, but could that be the source of
the problem, maybe during livre migration there is write comming from
two different servers at the same time ?
Post by Krutika Dhananjay
Hi,
I will try to recreate this issue tomorrow on my machines with the steps
that Lindsay provided in this thread. I will let you know the result soon
after that.
-Krutika
Post by Kevin Lemonnier
Hi,
Some news on this.
Over the week end the RAID Card of the node ipvr2 died, and I thought
that maybe that was the problem all along. The RAID Card was changed
and yesterday I reinstalled everything.
Same problem just now.
My test is simple, using the website hosted on the VMs all the time
I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
ipvr2 then reboot it, wait for the heal to complete then migrate all
the VMs off ipvr3 then reboot it.
Everytime the first database VM (which is the only one really using the
disk
Post by Kevin Lemonnier
durign the heal) starts showing I/O errors on it's disk.
Am I really the only one with that problem ?
Maybe one of the drives is dying too, who knows, but SMART isn't saying
anything ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Hi,
I had a problem some time ago with 3.7.6 and freezing during heals,
and multiple persons advised to use 3.7.11 instead. Indeed, with that
version the freez problem is fixed, it works like a dream ! You can
almost not tell that a node is down or healing, everything keeps
working
Post by Kevin Lemonnier
Post by Kevin Lemonnier
except for a little freez when the node just went down and I assume
hasn't timed out yet, but that's fine.
Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are
proxmox
Post by Kevin Lemonnier
Post by Kevin Lemonnier
VMs with qCow2 disks stored on the gluster volume.
Volume Name: gluster
Type: Replicate
Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ipvr2.client:/mnt/storage/gluster
Brick2: ipvr3.client:/mnt/storage/gluster
Brick3: ipvr50.client:/mnt/storage/gluster
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
performance.readdir-ahead: on
As mentioned, I rebooted one of the nodes to test the freezing issue I
had
Post by Kevin Lemonnier
Post by Kevin Lemonnier
on previous versions and appart from the initial timeout, nothing, the
website
Post by Kevin Lemonnier
Post by Kevin Lemonnier
hosted on the VMs keeps working like a charm even during heal.
Since it's testing, there isn't any load on it though, and I just tried
to refresh
Post by Kevin Lemonnier
Post by Kevin Lemonnier
the database by importing the production one on the two MySQL VMs, and
both of them
Post by Kevin Lemonnier
Post by Kevin Lemonnier
started doing I/O errors. I tried shutting them down and powering them
on again,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
but same thing, even starting full heals by hand doesn't solve the
problem, the disks are
Post by Kevin Lemonnier
Post by Kevin Lemonnier
corrupted. They still work, but sometimes they remount their partitions
read only ..
Post by Kevin Lemonnier
Post by Kevin Lemonnier
I believe there is a few people already using 3.7.11, no one noticed
corruption problems ?
Post by Kevin Lemonnier
Post by Kevin Lemonnier
Anyone using Proxmox ? As already mentionned in multiple other threads
on this mailing list
Post by Kevin Lemonnier
Post by Kevin Lemonnier
by other users, I also have pretty much always shards in heal info, but
nothing "stuck" there,
Post by Kevin Lemonnier
Post by Kevin Lemonnier
they always go away in a few seconds getting replaced by other shards.
Thanks
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Lindsay Mathieson
2016-05-25 07:46:18 UTC
Permalink
Post by Kevin Lemonnier
Nope, not solved !
Looks like directsync just delays the problem, this morning the VM had
thrown a bunch of I/O errors again. Tried writethrough and it seems to
behave exactly like cache=none, the errors appear in a few minutes.
Trying again with directsync and no errors for now, so it looks like
directsync is better than nothing, but still doesn't solve the problem.
Bummer :(


Whats the underlying filesystem under the bricks?
--
Lindsay Mathieson
Kevin Lemonnier
2016-05-25 07:58:43 UTC
Permalink
Post by Lindsay Mathieson
Whats the underlying filesystem under the bricks?
I use XFS, I read that was recommended. What are you using ?
Since yours seems to work, I'm not opposed to changing !
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Krutika Dhananjay
2016-05-25 09:08:00 UTC
Permalink
Hi Kevin,


If you actually ran into a 'read-only filesystem' issue, then it could
possibly because of a bug in AFR
that Pranith recently fixed.
To confirm if that is indeed the case, could you tell me if you saw the
pause after a brick (single brick) was
down while IO was going on?

-Krutika
Post by Kevin Lemonnier
Post by Lindsay Mathieson
Whats the underlying filesystem under the bricks?
I use XFS, I read that was recommended. What are you using ?
Since yours seems to work, I'm not opposed to changing !
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Kevin Lemonnier
2016-05-25 09:18:04 UTC
Permalink
Hi,

Not that I know of, no. Doesn't look like the bricks have trouble
communication, but is there a simple way to check that in glusterFS,
some sort of brick uptime ? Who knows, maybe the bricks are flickering
and I don't notice, that's entirely possible.

As mentionned, the problem occurs on it's own. I can trigger it faster
by using the disk a lot (doing database import) but it occured this night
for example and I wasn't using the machine at all. I googled a bit and I
found quite a lot of thread on the proxmox forum about this but for older
versions of glusterFS.

I am using qcow2 usually, just tried with raw and same problem. I just mounted
the volume with NFS, and I'm currently moving the disk on it to see if the
problem is libgfapi only or if it happens too with NFS.
Post by Krutika Dhananjay
Hi Kevin,
If you actually ran into a 'read-only filesystem' issue, then it could
possibly because of a bug in AFR
that Pranith recently fixed.
To confirm if that is indeed the case, could you tell meA if you saw the
pause after a brick (single brick) was
down while IO was going on?
-Krutika
A A Whats the underlying filesystem under the bricks?
I use XFS, I read that was recommended. What are you using ?
Since yours seems to work, I'm not opposed to changing !
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Krutika Dhananjay
2016-05-25 09:18:27 UTC
Permalink
Also, it seems Lindsay knows a way to get the gluster client logs when
using proxmox and libgfapi.
Would it be possible for you to get that sorted with Lindsay's help before
recreating this issue next time
and share the glusterfs client logs from all the nodes when you do hit the
issue?
It is critical for some of the debugging we do. :)

-Krutika
Post by Krutika Dhananjay
Hi Kevin,
If you actually ran into a 'read-only filesystem' issue, then it could
possibly because of a bug in AFR
that Pranith recently fixed.
To confirm if that is indeed the case, could you tell me if you saw the
pause after a brick (single brick) was
down while IO was going on?
-Krutika
Post by Kevin Lemonnier
Post by Lindsay Mathieson
Whats the underlying filesystem under the bricks?
I use XFS, I read that was recommended. What are you using ?
Since yours seems to work, I'm not opposed to changing !
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Kevin Lemonnier
2016-05-25 14:10:02 UTC
Permalink
Just did that, below is the output.
Didn't seem to move after the boot, and no new lines when the I/O errors appeared.
Also, as mentionned I tried moving the disk on NFS and had the exact same errors,
so it doesn't look like it's a libgfapi problem ..
I should probably re-create the VM, maybe the errors from this night corrupted
the disk and I now get errors unrelated to the original issue.

Let me re-create the VM from scratch and try to reproduce the problem with
the logs enabled, maybe it'll be more informative than this !


[2016-05-25 13:56:30.851493] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 6e733339-3635-3033-2e69-702d34362d31 (0) coming up
[2016-05-25 13:56:30.851553] I [MSGID: 114020] [client.c:2106:notify] 0-gluster-client-0: parent translators are ready, attempting connect on transport
[2016-05-25 13:56:30.852130] I [MSGID: 114020] [client.c:2106:notify] 0-gluster-client-1: parent translators are ready, attempting connect on transport
[2016-05-25 13:56:30.852650] I [MSGID: 114020] [client.c:2106:notify] 0-gluster-client-2: parent translators are ready, attempting connect on transport
[2016-05-25 13:56:30.852909] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-gluster-client-0: changing port to 49152 (from 0)
[2016-05-25 13:56:30.853434] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-gluster-client-1: changing port to 49152 (from 0)
[2016-05-25 13:56:30.853484] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-gluster-client-2: changing port to 49152 (from 0)
[2016-05-25 13:56:30.854182] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-gluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-05-25 13:56:30.854398] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-gluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-05-25 13:56:30.854441] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-gluster-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-05-25 13:56:30.861931] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-gluster-client-2: Connected to gluster-client-2, attached to remote volume '/mnt/storage/gluster'.
[2016-05-25 13:56:30.861965] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-gluster-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-05-25 13:56:30.862073] I [MSGID: 108005] [afr-common.c:4007:afr_notify] 0-gluster-replicate-0: Subvolume 'gluster-client-2' came back up; going online.
[2016-05-25 13:56:30.862139] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gluster-client-2: Server lk version = 1
[2016-05-25 13:56:30.865451] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-gluster-client-1: Connected to gluster-client-1, attached to remote volume '/mnt/storage/gluster'.
[2016-05-25 13:56:30.865485] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-gluster-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-05-25 13:56:30.865757] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gluster-client-1: Server lk version = 1
[2016-05-25 13:56:30.865826] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-gluster-client-0: Connected to gluster-client-0, attached to remote volume '/mnt/storage/gluster'.
[2016-05-25 13:56:30.865841] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-gluster-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2016-05-25 13:56:30.888604] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gluster-client-0: Server lk version = 1
[2016-05-25 13:56:30.890388] I [MSGID: 108031] [afr-common.c:1900:afr_local_discovery_cbk] 0-gluster-replicate-0: selecting local read_child gluster-client-2
[2016-05-25 13:56:30.890731] I [MSGID: 104041] [glfs-resolve.c:869:__glfs_active_subvol] 0-gluster: switched to graph 6e733339-3635-3033-2e69-702d34362d31 (0)
Post by Krutika Dhananjay
Also, it seems Lindsay knows a way to get the gluster client logs when
using proxmox and libgfapi.
Would it be possible for you to get that sorted with Lindsay's help before
recreating this issue next time
and share the glusterfs client logs from all the nodes when you do hit the
issue?
It is critical for some of the debugging we do. :)
-Krutika
Hi Kevin,
If you actually ran into a 'read-only filesystem' issue, then it could
possibly because of a bug in AFR
that Pranith recently fixed.
To confirm if that is indeed the case, could you tell meA if you saw
the pause after a brick (single brick) was
down while IO was going on?
-Krutika
A A Whats the underlying filesystem under the bricks?
I use XFS, I read that was recommended. What are you using ?
Since yours seems to work, I'm not opposed to changing !
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Kevin Lemonnier
2016-05-25 15:58:47 UTC
Permalink
There, re-created the VM from scratch, and still got the same errors.
Attached are the logs, I created the VM on node 50, worked fine. I tried
to reboot it and start my import again, still worked fine. I powered off the
VM, then started it again on node 2, rebooted it a bunch and just got the error
as usual, just attached a screen of the VM's console, might help.

I can see that everytime the VM powers down, glusterFS complains about an inode still
active, might it be the problem ?

Thanks for the help !
Post by Kevin Lemonnier
Just did that, below is the output.
Didn't seem to move after the boot, and no new lines when the I/O errors appeared.
Also, as mentionned I tried moving the disk on NFS and had the exact same errors,
so it doesn't look like it's a libgfapi problem ..
I should probably re-create the VM, maybe the errors from this night corrupted
the disk and I now get errors unrelated to the original issue.
Let me re-create the VM from scratch and try to reproduce the problem with
the logs enabled, maybe it'll be more informative than this !
[2016-05-25 13:56:30.851493] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 6e733339-3635-3033-2e69-702d34362d31 (0) coming up
[2016-05-25 13:56:30.851553] I [MSGID: 114020] [client.c:2106:notify] 0-gluster-client-0: parent translators are ready, attempting connect on transport
[2016-05-25 13:56:30.852130] I [MSGID: 114020] [client.c:2106:notify] 0-gluster-client-1: parent translators are ready, attempting connect on transport
[2016-05-25 13:56:30.852650] I [MSGID: 114020] [client.c:2106:notify] 0-gluster-client-2: parent translators are ready, attempting connect on transport
[2016-05-25 13:56:30.852909] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-gluster-client-0: changing port to 49152 (from 0)
[2016-05-25 13:56:30.853434] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-gluster-client-1: changing port to 49152 (from 0)
[2016-05-25 13:56:30.853484] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-gluster-client-2: changing port to 49152 (from 0)
[2016-05-25 13:56:30.854182] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-gluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-05-25 13:56:30.854398] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-gluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-05-25 13:56:30.854441] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-gluster-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-05-25 13:56:30.861931] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-gluster-client-2: Connected to gluster-client-2, attached to remote volume '/mnt/storage/gluster'.
[2016-05-25 13:56:30.861965] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-gluster-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-05-25 13:56:30.862073] I [MSGID: 108005] [afr-common.c:4007:afr_notify] 0-gluster-replicate-0: Subvolume 'gluster-client-2' came back up; going online.
[2016-05-25 13:56:30.862139] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gluster-client-2: Server lk version = 1
[2016-05-25 13:56:30.865451] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-gluster-client-1: Connected to gluster-client-1, attached to remote volume '/mnt/storage/gluster'.
[2016-05-25 13:56:30.865485] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-gluster-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-05-25 13:56:30.865757] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gluster-client-1: Server lk version = 1
[2016-05-25 13:56:30.865826] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-gluster-client-0: Connected to gluster-client-0, attached to remote volume '/mnt/storage/gluster'.
[2016-05-25 13:56:30.865841] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-gluster-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2016-05-25 13:56:30.888604] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gluster-client-0: Server lk version = 1
[2016-05-25 13:56:30.890388] I [MSGID: 108031] [afr-common.c:1900:afr_local_discovery_cbk] 0-gluster-replicate-0: selecting local read_child gluster-client-2
[2016-05-25 13:56:30.890731] I [MSGID: 104041] [glfs-resolve.c:869:__glfs_active_subvol] 0-gluster: switched to graph 6e733339-3635-3033-2e69-702d34362d31 (0)
Post by Krutika Dhananjay
Also, it seems Lindsay knows a way to get the gluster client logs when
using proxmox and libgfapi.
Would it be possible for you to get that sorted with Lindsay's help before
recreating this issue next time
and share the glusterfs client logs from all the nodes when you do hit the
issue?
It is critical for some of the debugging we do. :)
-Krutika
Hi Kevin,
If you actually ran into a 'read-only filesystem' issue, then it could
possibly because of a bug in AFR
that Pranith recently fixed.
To confirm if that is indeed the case, could you tell meA if you saw
the pause after a brick (single brick) was
down while IO was going on?
-Krutika
A A Whats the underlying filesystem under the bricks?
I use XFS, I read that was recommended. What are you using ?
Since yours seems to work, I'm not opposed to changing !
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Lindsay Mathieson
2016-05-27 11:35:00 UTC
Permalink
Post by Kevin Lemonnier
There, re-created the VM from scratch, and still got the same errors.
Just a thought - do you have bitrot detection enabled? (I don't)
--
Lindsay Mathieson
Kevin Lemonnier
2016-05-27 11:56:18 UTC
Permalink
Post by Lindsay Mathieson
Just a thought - do you have bitrot detection enabled? (I don't)
Yes, I did configure it to do a daily scrub when I reinstalled last time,
when I was wondering if maybe it was hardware. Doesn't seem like it detected
anything.
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Lindsay Mathieson
2016-05-27 12:22:01 UTC
Permalink
Post by Kevin Lemonnier
Yes, I did configure it to do a daily scrub when I reinstalled last time,
when I was wondering if maybe it was hardware. Doesn't seem like it detected
anything.
I was wondering if the scrub was interfering with things
--
Lindsay Mathieson
Gandalf Corvotempesta
2016-06-13 21:21:30 UTC
Permalink
Post by Kevin Lemonnier
Yes, I did configure it to do a daily scrub when I reinstalled last time,
when I was wondering if maybe it was hardware. Doesn't seem like it detected
anything.
Kevin, did you solve this issue? Any updates?
Kevin Lemonnier
2016-06-13 22:00:07 UTC
Permalink
Post by Gandalf Corvotempesta
Kevin, did you solve this issue? Any updates?
Oh yeah, we discussed it on IRC and it's apparently a known bug,
it's fixed in the next version. I tested a patched version and it
does seem to work, so I've been waiting for 3.7.12 since then to
do some proper testing and confirm that it's been solved !
Should be out hopefully in the next few days last I heard.
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Lindsay Mathieson
2016-05-25 10:45:00 UTC
Permalink
Post by Kevin Lemonnier
I use XFS, I read that was recommended. What are you using ?
Since yours seems to work, I'm not opposed to changing !
ZFS

- RAID10 (4 * WD Red 3TB)

- 8GB ram dedicated to ZFS

- SSD for log and cache (10GB and 100GB partitions respectively)

* compression=lz4
* atime=off
* xattr=sa
* sync=standard
* acltype=posixacl


What sort of i/o load are you seeing? mine vary between 0.6% to 5%, with
occasional spikes to 30% (updates etc).


I have had several windows VM's lock up on me in the past 4 weeks -
maybe its related.
--
Lindsay Mathieson
Continue reading on narkive:
Loading...