Discussion:
[Gluster-users] Problem with self-heal
Miloš Kozák
2014-07-01 20:58:55 UTC
Permalink
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with
one disk each and replica 2 mode.

I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just a
split of the file when writing is finished.. no surprise so far.

Then I put the cable back. After a while peers are discovered,
self-healing daemons start to communicate, so I can see:

gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1

Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1

But on the network there are no data moving, which I verify by df..

Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file was
2G big)

Thanks Milos
Ravishankar N
2014-07-02 05:38:32 UTC
Permalink
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
Post by Miloš Kozák
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Milos Kozak
2014-07-02 12:45:11 UTC
Permalink
Hi,

I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
Yes I mounted volume on both servers as follows:
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.

Milos
Ravishankar N
2014-07-02 15:38:17 UTC
Permalink
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Hi Milos,
If you are able to reproduce this issue, could you please file a bug
[1] and attach the gluster logs from both machines to the bug report?

[1]
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&component=replicate

Thanks ,
Ravi
Post by Milos Kozak
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Milos Kozak
2014-07-02 16:04:53 UTC
Permalink
Sure I will do that.. I was gonna enclose it to my answer, but this is
better I guess
Post by Ravishankar N
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Hi Milos,
If you are able to reproduce this issue, could you please file a bug
[1] and attach the gluster logs from both machines to the bug report?
[1]
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&component=replicate
Thanks ,
Ravi
Post by Milos Kozak
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Vijay Bellur
2014-07-02 15:40:07 UTC
Permalink
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might also
help.

Thanks,
Vijay
Miloš Kozák
2014-07-03 03:37:04 UTC
Permalink
Submitted: 1115748

Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
Miloš Kozák
2014-07-13 15:35:19 UTC
Permalink
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..

Thanks, Milos
Post by Miloš Kozák
Submitted: 1115748
Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2014-07-14 02:23:04 UTC
Permalink
Post by Miloš Kozák
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..
I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi
Post by Miloš Kozák
Thanks, Milos
Post by Miloš Kozák
Submitted: 1115748
Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Milos Kozak
2014-07-15 12:17:47 UTC
Permalink
Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to do
it, so I started to wonder whether it wasnt bad call..

I read the following links, so I would like to ask :D Does it mean, that
this bug is caused by very fast recovery of connection? Or are there
other things that come to the game? I am running 3.5.1 on production
servers for less important stuff, and there one server came down this
weekend. After all the heal process was totally fine. As long as the
real server boots nearly 5minuts. Does it mean that this was the reason
why I didnt experienced this bug?


When we can expect Gluster 3.5.2 to be released?

Thanks Milos
Post by Ravishankar N
Post by Miloš Kozák
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..
I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi
Post by Miloš Kozák
Thanks, Milos
Post by Miloš Kozák
Submitted: 1115748
Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2014-07-15 12:34:44 UTC
Permalink
Post by Milos Kozak
Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to
do it, so I started to wonder whether it wasnt bad call..
I read the following links, so I would like to ask :D Does it mean,
that this bug is caused by very fast recovery of connection? Or are
there other things that come to the game? I am running 3.5.1 on
production servers for less important stuff, and there one server came
down this weekend. After all the heal process was totally fine. As
long as the real server boots nearly 5minuts. Does it mean that this
was the reason why I didnt experienced this bug?
Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16
Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos
Post by Ravishankar N
Post by Miloš Kozák
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..
I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi
Post by Miloš Kozák
Thanks, Milos
Post by Miloš Kozák
Submitted: 1115748
Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes
configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Milos Kozak
2014-07-15 13:09:32 UTC
Permalink
I read your answer, but I dont know how how to create my RPM files,
because I dont want to install it right to the system.. Is there any manual?
Post by Ravishankar N
Post by Milos Kozak
Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to
do it, so I started to wonder whether it wasnt bad call..
I read the following links, so I would like to ask :D Does it mean,
that this bug is caused by very fast recovery of connection? Or are
there other things that come to the game? I am running 3.5.1 on
production servers for less important stuff, and there one server came
down this weekend. After all the heal process was totally fine. As
long as the real server boots nearly 5minuts. Does it mean that this
was the reason why I didnt experienced this bug?
Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16
Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos
Post by Ravishankar N
Post by Miloš Kozák
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..
I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi
Post by Miloš Kozák
Thanks, Milos
Post by Miloš Kozák
Submitted: 1115748
Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes
configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can you check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2014-07-15 14:23:26 UTC
Permalink
Post by Milos Kozak
I read your answer, but I dont know how how to create my RPM files,
because I dont want to install it right to the system.. Is there any manual?
http://www.gluster.org/community/documentation/index.php/CompilingRPMS
Compile the release-3.5 branch.
Post by Milos Kozak
Post by Ravishankar N
Post by Milos Kozak
Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to
do it, so I started to wonder whether it wasnt bad call..
I read the following links, so I would like to ask :D Does it mean,
that this bug is caused by very fast recovery of connection? Or are
there other things that come to the game? I am running 3.5.1 on
production servers for less important stuff, and there one server came
down this weekend. After all the heal process was totally fine. As
long as the real server boots nearly 5minuts. Does it mean that this
was the reason why I didnt experienced this bug?
Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16
Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos
Post by Ravishankar N
Post by Miloš Kozák
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..
I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi
Post by Miloš Kozák
Thanks, Milos
Post by Miloš Kozák
Submitted: 1115748
Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes
configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can
see
just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file. Can
you
check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Miloš Kozák
2014-07-20 18:33:08 UTC
Permalink
Hi, I tried my best, but I could not replicate the error not even on 3.5.1

Sorry, I can test it. It is kinda veird :D
Post by Ravishankar N
Post by Milos Kozak
I read your answer, but I dont know how how to create my RPM files,
because I dont want to install it right to the system.. Is there any manual?
http://www.gluster.org/community/documentation/index.php/CompilingRPMS
Compile the release-3.5 branch.
Post by Milos Kozak
Post by Ravishankar N
Post by Milos Kozak
Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to
do it, so I started to wonder whether it wasnt bad call..
I read the following links, so I would like to ask :D Does it mean,
that this bug is caused by very fast recovery of connection? Or are
there other things that come to the game? I am running 3.5.1 on
production servers for less important stuff, and there one server came
down this weekend. After all the heal process was totally fine. As
long as the real server boots nearly 5minuts. Does it mean that this
was the reason why I didnt experienced this bug?
Yes, it happened when the client quickly reconnected before the server
had a chance to discard the stale inode and fd tables. Hope you got a
chance to look at my comment in the BZ [1]
Thanks,
Ravi
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16
Post by Milos Kozak
When we can expect Gluster 3.5.2 to be released?
Thanks Milos
Post by Ravishankar N
Post by Miloš Kozák
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..
I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi
Post by Miloš Kozák
Thanks, Milos
Post by Miloš Kozák
Submitted: 1115748
Milos
Post by Vijay Bellur
Post by Milos Kozak
Hi,
I am going to replicate the problem on clean gluster
configuration
latter today. So far my answers are below.
Post by Ravishankar N
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I
can see
just
a split of the file when writing is finished
Does this mean your client (mount point) is also on node 1?
localhost:vg0 /mnt
Post by Ravishankar N
Post by Miloš Kozák
.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
When you get "Possibly undergoing heal" and no I/O is going on from the
client, it means the self-heal daemon is healing the file.
Can you
check
if there are messages in glustershd.log of node1 about self-heal
completion ?
There are no lines in log, that is the reason why I wrote this email
eventually.
Post by Ravishankar N
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?
All processes are running.
Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.
Thanks,
Vijay
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Tiziano Müller
2014-07-02 13:09:30 UTC
Permalink
Hi there

Not sure whether this is related, but we see the same problem with
glusterfs-3.4(.2). Several files are listed as being healed but they never
finish and checksums are identical.
We had some problems with NTP, meaning that the clocks on the nodes diverged by
a couple of seconds. I suspect this may be the root cause for it, but I could
not do any further tests and the files are still in the same state (self-healing).

Interestingly there are other threads describing this sort of problem, but
nothing came out so far.

Best,
Tiziano
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
tiziano.mueller@@stepping-stone.ch
Pranith Kumar Karampuri
2014-07-03 05:01:43 UTC
Permalink
Post by Tiziano Müller
Hi there
Not sure whether this is related, but we see the same problem with
glusterfs-3.4(.2). Several files are listed as being healed but they never
finish and checksums are identical.
We had some problems with NTP, meaning that the clocks on the nodes diverged by
a couple of seconds. I suspect this may be the root cause for it, but I could
not do any further tests and the files are still in the same state (self-healing).
Interestingly there are other threads describing this sort of problem, but
nothing came out so far.
Could you give getfattr -d -m. -e hex
<file-that-gives-this-problem-on-backend> outputs on both the bricks of
the replica pair to see what the problem is.

Pranith
Post by Tiziano Müller
Best,
Tiziano
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Tiziano Müller
2014-07-03 13:38:11 UTC
Permalink
Hi Pranith
Post by Tiziano Müller
Hi there
Not sure whether this is related, but we see the same problem with
glusterfs-3.4(.2). Several files are listed as being healed but they never
finish and checksums are identical.
We had some problems with NTP, meaning that the clocks on the nodes diverged by
a couple of seconds. I suspect this may be the root cause for it, but I could
not do any further tests and the files are still in the same state (self-healing).
Interestingly there are other threads describing this sort of problem, but
nothing came out so far.
Could you give getfattr -d -m. -e hex <file-that-gives-this-problem-on-backend>
outputs on both the bricks of the replica pair to see what the problem is.
Ok, I picked one the volumes which are permanently listed in the heal info:

node-01 ~ # gluster vol heal virtualization info | grep db98
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2

node-01 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2

getfattr: Removing leading '/' from absolute path names
# file:
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec

node-02 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2

getfattr: Removing leading '/' from absolute path names
# file:
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec

Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?

Thanks in advance,
Tiziano
Pranith
Post by Tiziano Müller
Best,
Tiziano
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
tiziano.mueller@@stepping-stone.ch
Pranith Kumar Karampuri
2014-07-03 15:16:27 UTC
Permalink
Post by Tiziano Müller
Hi Pranith
Post by Tiziano Müller
Hi there
Not sure whether this is related, but we see the same problem with
glusterfs-3.4(.2). Several files are listed as being healed but they never
finish and checksums are identical.
We had some problems with NTP, meaning that the clocks on the nodes diverged by
a couple of seconds. I suspect this may be the root cause for it, but I could
not do any further tests and the files are still in the same state (self-healing).
Interestingly there are other threads describing this sort of problem, but
nothing came out so far.
Could you give getfattr -d -m. -e hex <file-that-gives-this-problem-on-backend>
outputs on both the bricks of the replica pair to see what the problem is.
node-01 ~ # gluster vol heal virtualization info | grep db98
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
node-01 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
getfattr: Removing leading '/' from absolute path names
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec
node-02 ~ # getfattr -d -m. -e hex
/var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
getfattr: Removing leading '/' from absolute path names
var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
trusted.afr.virtualization-client-0=0x000000020000000000000000
trusted.afr.virtualization-client-1=0x000000020000000000000000
trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec
Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?
https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md

Is I/O happening on those files? I think yes because they are VM files.
There was this problem of false +ves with releases earlier than 3.5.1.
Releases earlier than 3.5.1 did not have capability to distinguish
between on-going I/O and requirement of self-heal. So even if I/O is
happening they will be shown under files that need self-heal.

Pranith
Post by Tiziano Müller
Thanks in advance,
Tiziano
Pranith
Post by Tiziano Müller
Best,
Tiziano
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Tiziano Müller
2014-07-03 15:26:45 UTC
Permalink
Hi Pranith

Am 03.07.2014 17:16, schrieb Pranith Kumar Karampuri:
[...]
Post by Pranith Kumar Karampuri
Post by Tiziano Müller
Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?
https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md
Thanks.
Post by Pranith Kumar Karampuri
Is I/O happening on those files? I think yes because they are VM files. There
was this problem of false +ves with releases earlier than 3.5.1. Releases
earlier than 3.5.1 did not have capability to distinguish between on-going I/O
and requirement of self-heal. So even if I/O is happening they will be shown
under files that need self-heal.
Ok, that explains why some of the files are suddenly listed and then vanish again.

The problem is that when we shut down all VMs (which were using gfapi) last
week, some images were listed as to be self-healed, but no I/O happened.
Also after a gluster vol stop/start and a reboot, the same files were listed and
nothing changed. After comparing the checksums of the files on the 2 bricks we
resumed operation.

Any ideas?

Best,
Tiziano
Post by Pranith Kumar Karampuri
Pranith
Post by Tiziano Müller
Thanks in advance,
Tiziano
Post by Pranith Kumar Karampuri
Pranith
Post by Tiziano Müller
Best,
Tiziano
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
tiziano.mueller@@stepping-stone.ch
Pranith Kumar Karampuri
2014-07-03 15:54:33 UTC
Permalink
Post by Tiziano Müller
Hi Pranith
[...]
Post by Pranith Kumar Karampuri
Post by Tiziano Müller
Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?
https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md
Thanks.
Post by Pranith Kumar Karampuri
Is I/O happening on those files? I think yes because they are VM files. There
was this problem of false +ves with releases earlier than 3.5.1. Releases
earlier than 3.5.1 did not have capability to distinguish between on-going I/O
and requirement of self-heal. So even if I/O is happening they will be shown
under files that need self-heal.
Ok, that explains why some of the files are suddenly listed and then vanish again.
The problem is that when we shut down all VMs (which were using gfapi) last
week, some images were listed as to be self-healed, but no I/O happened.
Also after a gluster vol stop/start and a reboot, the same files were listed and
nothing changed. After comparing the checksums of the files on the 2 bricks we
resumed operation.
It would be helpful if you could provide getfattr output when such
things happen so that we can try to see why it is happening that way.
These are afr changelog smells I developed over time working on afr,
they would be correct most of the times but not always:
Once I see getfattr output on both the bricks,
1) If files have equal numbers and the files are undergoing changes,
most probably it is just normal I/O no heal is required
2) If files have unequal numbers with the numbers differing by a lot and
files are undergoing changes, then most probably heal is required while
I/O is going on.
3) If files have unequal numbers with numbers differing and files are
not undergoing changes, the heal is required.
4) If files have equal numbers with same numbers and files are not
undergoing changes, then the mount must have crashed or the volume is
stopped while the I/O is in progress.

Again these are just most probable guesses not accurate.

Pranith
Post by Tiziano Müller
Any ideas?
Best,
Tiziano
Post by Pranith Kumar Karampuri
Pranith
Post by Tiziano Müller
Thanks in advance,
Tiziano
Post by Pranith Kumar Karampuri
Pranith
Post by Tiziano Müller
Best,
Tiziano
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered, self-healing
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Joe Julian
2014-07-03 15:52:18 UTC
Permalink
Post by Pranith Kumar Karampuri
Is I/O happening on those files? I think yes because they are VM
files. There was this problem of false +ves with releases earlier than
3.5.1. Releases earlier than 3.5.1 did not have capability to
distinguish between on-going I/O and requirement of self-heal. So even
if I/O is happening they will be shown under files that need self-heal.
Has that capability been backported to 3.4?
Pranith Kumar Karampuri
2014-07-03 16:36:07 UTC
Permalink
Post by Joe Julian
Post by Pranith Kumar Karampuri
Is I/O happening on those files? I think yes because they are VM
files. There was this problem of false +ves with releases earlier
than 3.5.1. Releases earlier than 3.5.1 did not have capability to
distinguish between on-going I/O and requirement of self-heal. So
even if I/O is happening they will be shown under files that need
self-heal.
Has that capability been backported to 3.4?
No.
In total there are 6 patches
https://bugzilla.redhat.com/show_bug.cgi?id=1039544
Patches between comment 69 through 75.

Pranith
Pranith Kumar Karampuri
2014-07-03 05:00:07 UTC
Permalink
Post by Miloš Kozák
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.
I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large file. In
the middle of writing process I disconnect the cable, so on one server
(node1) I can see all data and on the other one (node2) I can see just
a split of the file when writing is finished.. no surprise so far.
Then I put the cable back. After a while peers are discovered,
gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1
But on the network there are no data moving, which I verify by df..
Could you execute "gluster volume statedump vg0" 2 times 2 minutes apart
and attach the files in /var/run/gluster to the bug you raised. We need
to verify if it is running into bug fixed by
http://review.gluster.com/8187 for 3.5.2

Pranith
Post by Miloš Kozák
Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Thanks Milos
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Loading...