[Gluster-users] Problem with self-heal

Does this mean your client (mount point) is also on node 1?

Post by MiloÅ¡ KozÃ¡k
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..

When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

Post by MiloÅ¡ KozÃ¡k
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Milos Kozak

2014-07-02 12:45:11 UTC

Hi,

I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

Yes I mounted volume on both servers as follows:
localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Milos

Ravishankar N

2014-07-02 15:38:17 UTC

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Hi Milos,
If you are able to reproduce this issue, could you please file a bug
[1] and attach the gluster logs from both machines to the bug report?

[1]
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&component=replicate

Thanks ,
Ravi

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.
Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Milos Kozak

2014-07-02 16:04:53 UTC

Sure I will do that.. I was gonna enclose it to my answer, but this is
better I guess

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Hi Milos,
If you are able to reproduce this issue, could you please file a bug
[1] and attach the gluster logs from both machines to the bug report?
[1]
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&component=replicate
Thanks ,
Ravi

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.
Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Vijay Bellur

2014-07-02 15:40:07 UTC

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might also
help.

Thanks,
Vijay

Miloš Kozák

2014-07-03 03:37:04 UTC

Submitted: 1115748

Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

Miloš Kozák

2014-07-13 15:35:19 UTC

Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

Thanks, Milos

Post by MiloÅ¡ KozÃ¡k
Submitted: 1115748
Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Ravishankar N

2014-07-14 02:23:04 UTC

Post by MiloÅ¡ KozÃ¡k
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi

Post by MiloÅ¡ KozÃ¡k
Thanks, Milos

Post by MiloÅ¡ KozÃ¡k
Submitted: 1115748
Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Milos Kozak

2014-07-15 12:17:47 UTC

Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to do
it, so I started to wonder whether it wasnt bad call..

I read the following links, so I would like to ask :D Does it mean, that
this bug is caused by very fast recovery of connection? Or are there
other things that come to the game? I am running 3.5.1 on production
servers for less important stuff, and there one server came down this
weekend. After all the heal process was totally fine. As long as the
real server boots nearly 5minuts. Does it mean that this was the reason
why I didnt experienced this bug?

When we can expect Gluster 3.5.2 to be released?

Thanks Milos

Post by MiloÅ¡ KozÃ¡k
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi

Post by MiloÅ¡ KozÃ¡k
Thanks, Milos

Post by MiloÅ¡ KozÃ¡k
Submitted: 1115748
Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Ravishankar N

2014-07-15 12:34:44 UTC

Post by Milos Kozak
Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to
do it, so I started to wonder whether it wasnt bad call..
I read the following links, so I would like to ask :D Does it mean,
that this bug is caused by very fast recovery of connection? Or are
there other things that come to the game? I am running 3.5.1 on
production servers for less important stuff, and there one server came
down this weekend. After all the heal process was totally fine. As
long as the real server boots nearly 5minuts. Does it mean that this
was the reason why I didnt experienced this bug?

Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16

Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos

Post by MiloÅ¡ KozÃ¡k
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi

Post by MiloÅ¡ KozÃ¡k
Thanks, Milos

Post by MiloÅ¡ KozÃ¡k
Submitted: 1115748
Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Milos Kozak

2014-07-15 13:09:32 UTC

I read your answer, but I dont know how how to create my RPM files,
because I dont want to install it right to the system.. Is there any manual?

Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16

Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos

Post by MiloÅ¡ KozÃ¡k
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi

Post by MiloÅ¡ KozÃ¡k
Thanks, Milos

Post by MiloÅ¡ KozÃ¡k
Submitted: 1115748
Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Ravishankar N

2014-07-15 14:23:26 UTC

Post by Milos Kozak
I read your answer, but I dont know how how to create my RPM files,
because I dont want to install it right to the system.. Is there any manual?

http://www.gluster.org/community/documentation/index.php/CompilingRPMS
Compile the release-3.5 branch.

Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16

Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos

Post by MiloÅ¡ KozÃ¡k
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi

Post by MiloÅ¡ KozÃ¡k
Thanks, Milos

Post by MiloÅ¡ KozÃ¡k
Submitted: 1115748
Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can
you
check
if there are messages in glustershd.log of node1 about self-heal
completion ?

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Miloš Kozák

2014-07-20 18:33:08 UTC

Hi, I tried my best, but I could not replicate the error not even on 3.5.1

Sorry, I can test it. It is kinda veird :D

Post by Milos Kozak
I read your answer, but I dont know how how to create my RPM files,
because I dont want to install it right to the system.. Is there any manual?

http://www.gluster.org/community/documentation/index.php/CompilingRPMS
Compile the release-3.5 branch.

Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16

Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos

Post by MiloÅ¡ KozÃ¡k
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi

Post by MiloÅ¡ KozÃ¡k
Thanks, Milos

Post by MiloÅ¡ KozÃ¡k
Submitted: 1115748
Milos

Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster
configuration
latter today. So far my answers are below.

Does this mean your client (mount point) is also on node 1?

localhost:vg0 /mnt

When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file.
Can you
check
if there are messages in glustershd.log of node1 about self-heal
completion ?

There are no lines in log, that is the reason why I wrote this email
eventually.

Post by MiloÅ¡ KozÃ¡k
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)

Does gluster volume status show all processes being online?

All processes are running.

Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay

_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Tiziano Müller

2014-07-02 13:09:30 UTC

Hi there

Not sure whether this is related, but we see the same problem with
glusterfs-3.4(.2). Several files are listed as being healed but they never
finish and checksums are identical.
We had some problems with NTP, meaning that the clocks on the nodes diverged by
a couple of seconds. I suspect this may be the root cause for it, but I could
not do any further tests and the files are still in the same state (self-healing).

Interestingly there are other threads describing this sort of problem, but
nothing came out so far.

Best,
Tiziano

Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
tiziano.mueller@@stepping-stone.ch

Pranith Kumar Karampuri

2014-07-03 05:01:43 UTC

Post by Tiziano MÃ¼ller
Hi there
Not sure whether this is related, but we see the same problem with
glusterfs-3.4(.2). Several files are listed as being healed but they never
finish and checksums are identical.
We had some problems with NTP, meaning that the clocks on the nodes diverged by
a couple of seconds. I suspect this may be the root cause for it, but I could
not do any further tests and the files are still in the same state (self-healing).
Interestingly there are other threads describing this sort of problem, but
nothing came out so far.

Could you give getfattr -d -m. -e hex
<file-that-gives-this-problem-on-backend> outputs on both the bricks of
the replica pair to see what the problem is.

Pranith

Post by Tiziano MÃ¼ller
Best,
Tiziano

Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Tiziano Müller

2014-07-03 13:38:11 UTC

Hi Pranith

Could you give getfattr -d -m. -e hex <file-that-gives-this-problem-on-backend>
outputs on both the bricks of the replica pair to see what the problem is.

Ok, I picked one the volumes which are permanently listed in the heal info:

node-01 ~ # gluster vol heal virtualization info | grep db98
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2

node-01 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2

getfattr: Removing leading '/' from absolute path names
# file:
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec

node-02 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2

getfattr: Removing leading '/' from absolute path names
# file:
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec

Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?

Thanks in advance,
Tiziano

Pranith

Post by Tiziano MÃ¼ller
Best,
Tiziano

Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
tiziano.mueller@@stepping-stone.ch

Pranith Kumar Karampuri

2014-07-03 15:16:27 UTC

Post by Tiziano MÃ¼ller
Hi Pranith

Could you give getfattr -d -m. -e hex <file-that-gives-this-problem-on-backend>
outputs on both the bricks of the replica pair to see what the problem is.

node-01 ~ # gluster vol heal virtualization info | grep db98
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
node-01 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
getfattr: Removing leading '/' from absolute path names
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec
node-02 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
getfattr: Removing leading '/' from absolute path names
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec
Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?

https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md

Is I/O happening on those files? I think yes because they are VM files.
There was this problem of false +ves with releases earlier than 3.5.1.
Releases earlier than 3.5.1 did not have capability to distinguish
between on-going I/O and requirement of self-heal. So even if I/O is
happening they will be shown under files that need self-heal.

Pranith

Post by Tiziano MÃ¼ller
Thanks in advance,
Tiziano

Pranith

Post by Tiziano MÃ¼ller
Best,
Tiziano

Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Tiziano Müller

2014-07-03 15:26:45 UTC

Post by Pranith Kumar Karampuri

Hi Pranith

Am 03.07.2014 17:16, schrieb Pranith Kumar Karampuri:
[...]

Post by Tiziano MÃ¼ller
Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?

https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md

Thanks.

Post by Pranith Kumar Karampuri
Is I/O happening on those files? I think yes because they are VM files. There
was this problem of false +ves with releases earlier than 3.5.1. Releases
earlier than 3.5.1 did not have capability to distinguish between on-going I/O
and requirement of self-heal. So even if I/O is happening they will be shown
under files that need self-heal.

Ok, that explains why some of the files are suddenly listed and then vanish again.

The problem is that when we shut down all VMs (which were using gfapi) last
week, some images were listed as to be self-healed, but no I/O happened.
Also after a gluster vol stop/start and a reboot, the same files were listed and
nothing changed. After comparing the checksums of the files on the 2 bricks we
resumed operation.

Any ideas?

Best,
Tiziano

Post by Tiziano MÃ¼ller
Thanks in advance,
Tiziano

Post by Tiziano MÃ¼ller
Best,
Tiziano

Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
tiziano.mueller@@stepping-stone.ch

Pranith Kumar Karampuri

2014-07-03 15:54:33 UTC

Post by Pranith Kumar Karampuri

Post by Tiziano MÃ¼ller
Hi Pranith
[...]

Post by Tiziano MÃ¼ller
Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?

https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md

Thanks.

Ok, that explains why some of the files are suddenly listed and then vanish again.
The problem is that when we shut down all VMs (which were using gfapi) last
week, some images were listed as to be self-healed, but no I/O happened.
Also after a gluster vol stop/start and a reboot, the same files were listed and
nothing changed. After comparing the checksums of the files on the 2 bricks we
resumed operation.

It would be helpful if you could provide getfattr output when such
things happen so that we can try to see why it is happening that way.
These are afr changelog smells I developed over time working on afr,
they would be correct most of the times but not always:
Once I see getfattr output on both the bricks,
1) If files have equal numbers and the files are undergoing changes,
most probably it is just normal I/O no heal is required
2) If files have unequal numbers with the numbers differing by a lot and
files are undergoing changes, then most probably heal is required while
I/O is going on.
3) If files have unequal numbers with numbers differing and files are
not undergoing changes, the heal is required.
4) If files have equal numbers with same numbers and files are not
undergoing changes, then the mount must have crashed or the volume is
stopped while the I/O is in progress.

Again these are just most probable guesses not accurate.

Pranith

Post by Tiziano MÃ¼ller
Any ideas?
Best,
Tiziano

Post by Tiziano MÃ¼ller
Thanks in advance,
Tiziano

Post by Tiziano MÃ¼ller
Best,
Tiziano

Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Joe Julian

2014-07-03 15:52:18 UTC

Post by Pranith Kumar Karampuri
Is I/O happening on those files? I think yes because they are VM
files. There was this problem of false +ves with releases earlier than
3.5.1. Releases earlier than 3.5.1 did not have capability to
distinguish between on-going I/O and requirement of self-heal. So even
if I/O is happening they will be shown under files that need self-heal.

Has that capability been backported to 3.4?

Pranith Kumar Karampuri

2014-07-03 16:36:07 UTC

Post by Joe Julian

Post by Pranith Kumar Karampuri
Is I/O happening on those files? I think yes because they are VM
files. There was this problem of false +ves with releases earlier
than 3.5.1. Releases earlier than 3.5.1 did not have capability to
distinguish between on-going I/O and requirement of self-heal. So
even if I/O is happening they will be shown under files that need
self-heal.

Has that capability been backported to 3.4?

No.
In total there are 6 patches
https://bugzilla.redhat.com/show_bug.cgi?id=1039544
Patches between comment 69 through 75.

Pranith

Pranith Kumar Karampuri

2014-07-03 05:00:07 UTC