Discussion:
timestamps getting updated during self-heal after primary brick rebuild
(too old to reply)
Todd Stansell
2013-02-28 23:23:34 UTC
Permalink
We're looking at using glusterfs to provide a shared filesystem between two
nodes, using just local disk. They are both gluster servers as well as
clients. This is on CentOS 5.9 64-bit. The bricks are simply ext3
filesystems on top of LVM:

/dev/mapper/VolGroup00-LogVol0 on /gfs0 type ext3 (rw,user_xattr)

We set up a test volume with:

host14# gluster volume create gv0 replica 2 transport tcp host14:/gfs0 host13:/gfs0
host14# gluster volume set gv0 nfs.disable on
host14# gluster volume start gv0

This works just fine. The issue is simulating hardware failure where we need
to rebuild an entire node. In this case, we kickstart our server which creates
all fresh new filesystems. We have a kickstart postinstall script that sets
the glusterd UUID of the server so that it never changes. It then does a probe
of the other server, looks for existing volumes, sets up fstab entries for them
(to also act as a client) and also sets up an init script to force a full heal
every time the server boots just to ensure all data is replicated to both
nodes. All of this works great when I'm rebuilding the second brick.

The issue I have is when we rebuild the server that hosts the primary brick
(host14:/gfs0). It will come online and start copying data from host13:/gfs0,
but as it does so, it sets the timestamps of the files on host13:/gfs0 to the
time it healed the data on host14:/gfs0. As a result, all files in the
filesystem end up with timestamps of when the first brick was healed.

I enabled client debug logs and the following indicates that it *thinks* it is
doing the right thing:

after rebuilding gv0-client-1:
[2013-02-28 00:01:37.264018] D [afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0: self-healing metadata of /data/bin/sync-data from gv0-client-0 to gv0-client-1

after rebuilding gv0-client-0:
[2013-02-28 00:17:03.578377] D [afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0: self-healing metadata of /data/bin/sync-data from gv0-client-1 to gv0-client-0

Unfortunately, in the second case, the timestamp of the files changed from:

-r-xr-xr-x 1 root root 2717 Feb 27 23:32 /data/data/bin/sync-data*

to:

-r-xr-xr-x 1 root root 2717 Feb 28 00:17 /data/data/bin/sync-data*

And remember, there's nothing accessing any data in this volume so there's no
"client" access going on anywhere. No changes happening on the filesystem,
other than self-heal screwing things up.

The only thing I could find in any logs that would indicate a problem was this
in the brick log:

[2013-02-28 00:17:03.583063] D [posix.c:323:posix_do_utimes] 0-gv0-posix: /gfs0/data/bin/sync-data (Function not implemented)

I've also now built a Centos 6 host and verified that the same behavior
happens there, though I get a slightly different brick debug log (which makes
me think this has nothing to do with what I'm seeing):

[2013-02-28 23:07:41.879440] D [posix.c:262:posix_do_chmod] 0-gv0-posix: /gfs0/data/bin/sync-data (Function not implemented)

Here's any basic info that might help folks know what's going on:

# rpm -qa | grep gluster
glusterfs-server-3.3.1-1.el5
glusterfs-3.3.1-1.el5
glusterfs-fuse-3.3.1-1.el5

# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 7cec2ba3-f69c-409a-a259-0d055792b11a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: host14:/gfs0
Brick2: host13:/gfs0
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG
nfs.disable: on

Todd
Todd Stansell
2013-03-06 07:43:36 UTC
Permalink
In the interest of pinging the community for *any* sort of feedback, I'd like
to note that we rebuilt things on centos 6 with btrfs as the filesystem to use
something entirely different. We see the same behavior. After rebuilding the
first brick in the 2-brick replicate cluster, all file timestamps get updated
to the time self-heal copies the data back to that brick.

This is obviously a bug in 3.3.1. We basically did what's described here:

http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server

and timestamps get updated on all files. Can someone acknowledge that this
sounds like a bug? Does anyone care?

Being relatively new to glusterfs, it's painful to watch the mailing list and
even the IRC channel and see many folks ask questions with nothing but
silence. I honestly wasn't sure if glusterfs was actively being supported
anymore. Given the recent flurry of mail about lack of documentation I see
that's not really true. Unfortunately, given that what I'm seeing is a form
of data corruption (yes, timestamps do matter), I'm surprised nobody's
interested to help figure out what's going wrong. Hopefully it's something
about the way I've build out cluster (though it seems less and less likely
given we are able to replicate the problem so easily).

Todd
Post by Todd Stansell
We're looking at using glusterfs to provide a shared filesystem between two
nodes, using just local disk. They are both gluster servers as well as
clients. This is on CentOS 5.9 64-bit. The bricks are simply ext3
/dev/mapper/VolGroup00-LogVol0 on /gfs0 type ext3 (rw,user_xattr)
host14# gluster volume create gv0 replica 2 transport tcp host14:/gfs0 host13:/gfs0
host14# gluster volume set gv0 nfs.disable on
host14# gluster volume start gv0
This works just fine. The issue is simulating hardware failure where we need
to rebuild an entire node. In this case, we kickstart our server which creates
all fresh new filesystems. We have a kickstart postinstall script that sets
the glusterd UUID of the server so that it never changes. It then does a probe
of the other server, looks for existing volumes, sets up fstab entries for them
(to also act as a client) and also sets up an init script to force a full heal
every time the server boots just to ensure all data is replicated to both
nodes. All of this works great when I'm rebuilding the second brick.
The issue I have is when we rebuild the server that hosts the primary brick
(host14:/gfs0). It will come online and start copying data from host13:/gfs0,
but as it does so, it sets the timestamps of the files on host13:/gfs0 to the
time it healed the data on host14:/gfs0. As a result, all files in the
filesystem end up with timestamps of when the first brick was healed.
I enabled client debug logs and the following indicates that it *thinks* it is
[2013-02-28 00:01:37.264018] D [afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0: self-healing metadata of /data/bin/sync-data from gv0-client-0 to gv0-client-1
[2013-02-28 00:17:03.578377] D [afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0: self-healing metadata of /data/bin/sync-data from gv0-client-1 to gv0-client-0
-r-xr-xr-x 1 root root 2717 Feb 27 23:32 /data/data/bin/sync-data*
-r-xr-xr-x 1 root root 2717 Feb 28 00:17 /data/data/bin/sync-data*
And remember, there's nothing accessing any data in this volume so there's no
"client" access going on anywhere. No changes happening on the filesystem,
other than self-heal screwing things up.
The only thing I could find in any logs that would indicate a problem was this
[2013-02-28 00:17:03.583063] D [posix.c:323:posix_do_utimes] 0-gv0-posix: /gfs0/data/bin/sync-data (Function not implemented)
I've also now built a Centos 6 host and verified that the same behavior
happens there, though I get a slightly different brick debug log (which makes
[2013-02-28 23:07:41.879440] D [posix.c:262:posix_do_chmod] 0-gv0-posix: /gfs0/data/bin/sync-data (Function not implemented)
# rpm -qa | grep gluster
glusterfs-server-3.3.1-1.el5
glusterfs-3.3.1-1.el5
glusterfs-fuse-3.3.1-1.el5
# gluster volume info
Volume Name: gv0
Type: Replicate
Volume ID: 7cec2ba3-f69c-409a-a259-0d055792b11a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: host14:/gfs0
Brick2: host13:/gfs0
diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG
nfs.disable: on
Todd
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
John Mark Walker
2013-03-06 08:28:39 UTC
Permalink
Hi Todd,

A general note here: when someone posts a question and noone responds, it's generally because either no one has seen that particular behavior and they don't know how to respond, or they didn't understand what you were saying. In this case, I'd say it is the former.

----- Original Message -----
Post by Todd Stansell
something entirely different. We see the same behavior. After
rebuilding the
first brick in the 2-brick replicate cluster, all file timestamps get
updated
to the time self-heal copies the data back to that brick.
This is obviously a bug in 3.3.1. We basically did what's described
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server
and timestamps get updated on all files. Can someone acknowledge
that this
sounds like a bug? Does anyone care?
Please file a bug and include the relevant information at
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS

- after searching for any similar bugs, of course.
Post by Todd Stansell
Being relatively new to glusterfs, it's painful to watch the mailing
list and
even the IRC channel and see many folks ask questions with nothing
but
silence. I honestly wasn't sure if glusterfs was actively being
supported
??? Our IRC channel is one of the most active in the open source world. I'm honestly not sure what mailing lists or IRC channels you've been watching.
Post by Todd Stansell
anymore. Given the recent flurry of mail about lack of documentation
I see
that's not really true. Unfortunately, given that what I'm seeing is
a form
of data corruption (yes, timestamps do matter), I'm surprised
nobody's
interested to help figure out what's going wrong. Hopefully it's
something
about the way I've build out cluster (though it seems less and less
likely
given we are able to replicate the problem so easily).
I can understand your frustration. I would be, also. However, given that I haven't heard of this problem before, I don't know how you were able to reproduce it. The best I can offer is that we'll investigate your bug report.

Thanks,
JM
Whit Blauvelt
2013-03-06 14:28:35 UTC
Permalink
A small question: We know that one or two members of the dev team read these
emails. One said just yesterday he's more likely to see emails than bug
reports. Now, sometimes the response to an email showing an obvious bug is
"File a bug report please." But for something like this - yes timestamps are
data so this is a serious bug - it would be a healthy thing if someone on
the dev team would make a point of both acknowledging that it's a bug, and
taking responsibility for being sure the bug report is filed and assigned to
the right people, whether or not the email writer has taken that step.

If the team's too small to follow through like this, is someone advocating
with Red Hat for more staff? They've made a large investment in Gluster,
which they might want to protect by fully staffing it. It's the fault of the
firm, not current project staff, if the current staffing is too thin.

Apologies if these reflections are out of place in a community discussion.
But it's in the community's interest that Red Hat succeeds and profits from
its Gluster purchase.

Best,
Whit
Post by John Mark Walker
Hi Todd,
A general note here: when someone posts a question and noone responds,
it's generally because either no one has seen that particular behavior and
they don't know how to respond, or they didn't understand what you were
saying. In this case, I'd say it is the former.
----- Original Message -----
Post by Todd Stansell
something entirely different. We see the same behavior. After rebuilding the
first brick in the 2-brick replicate cluster, all file timestamps get updated
to the time self-heal copies the data back to that brick.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server
and timestamps get updated on all files. Can someone acknowledge that this
sounds like a bug? Does anyone care?
Please file a bug and include the relevant information at
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
- after searching for any similar bugs, of course.
Post by Todd Stansell
Being relatively new to glusterfs, it's painful to watch the mailing list and
even the IRC channel and see many folks ask questions with nothing but
silence. I honestly wasn't sure if glusterfs was actively being supported
??? Our IRC channel is one of the most active in the open source world. I'm honestly not sure what mailing lists or IRC channels you've been watching.
Post by Todd Stansell
anymore. Given the recent flurry of mail about lack of documentation I see
that's not really true. Unfortunately, given that what I'm seeing is a form
of data corruption (yes, timestamps do matter), I'm surprised nobody's
interested to help figure out what's going wrong. Hopefully it's something
about the way I've build out cluster (though it seems less and less likely
given we are able to replicate the problem so easily).
I can understand your frustration. I would be, also. However, given that I haven't heard of this problem before, I don't know how you were able to reproduce it. The best I can offer is that we'll investigate your bug report.
Thanks,
JM
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Joe Julian
2013-03-06 19:26:46 UTC
Permalink
If that was said (not saying it wasn't, I just didn't notice it) then my
guess would be just the opposite. The team's large enough that the PM
assigns the bugs to the right person that manages their portion of the
program. Someone not seeing a bug would probably be because they're not
part of that assignment.

<IMHO>
Someone complaining to a *user* mailing list about a problem is not a
bug report nor should it be. If followup information is needed, it's
important that the bug manager, the person reporting the bug, and anyone
else tracking that bug be informed. Bug trackers are a critical tool
that keeps development organized.

If, in a community organization (don't forget, gluster.org is the
upstream community organization that's ultimately in charge of this
development), someone feels that it's important to file bugs based on
emails to a user list, then a community member should take that role on.
</IMHO>

JMW, by the way, is the community's advocate to Red Hat.
Post by Whit Blauvelt
A small question: We know that one or two members of the dev team read these
emails. One said just yesterday he's more likely to see emails than bug
reports. Now, sometimes the response to an email showing an obvious bug is
"File a bug report please." But for something like this - yes timestamps are
data so this is a serious bug - it would be a healthy thing if someone on
the dev team would make a point of both acknowledging that it's a bug, and
taking responsibility for being sure the bug report is filed and assigned to
the right people, whether or not the email writer has taken that step.
If the team's too small to follow through like this, is someone advocating
with Red Hat for more staff? They've made a large investment in Gluster,
which they might want to protect by fully staffing it. It's the fault of the
firm, not current project staff, if the current staffing is too thin.
Apologies if these reflections are out of place in a community discussion.
But it's in the community's interest that Red Hat succeeds and profits from
its Gluster purchase.
Best,
Whit
Post by John Mark Walker
Hi Todd,
A general note here: when someone posts a question and noone responds,
it's generally because either no one has seen that particular behavior and
they don't know how to respond, or they didn't understand what you were
saying. In this case, I'd say it is the former.
----- Original Message -----
Post by Todd Stansell
something entirely different. We see the same behavior. After rebuilding the
first brick in the 2-brick replicate cluster, all file timestamps get updated
to the time self-heal copies the data back to that brick.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server
and timestamps get updated on all files. Can someone acknowledge that this
sounds like a bug? Does anyone care?
Please file a bug and include the relevant information at
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
- after searching for any similar bugs, of course.
Post by Todd Stansell
Being relatively new to glusterfs, it's painful to watch the mailing list and
even the IRC channel and see many folks ask questions with nothing but
silence. I honestly wasn't sure if glusterfs was actively being supported
??? Our IRC channel is one of the most active in the open source world. I'm honestly not sure what mailing lists or IRC channels you've been watching.
Post by Todd Stansell
anymore. Given the recent flurry of mail about lack of documentation I see
that's not really true. Unfortunately, given that what I'm seeing is a form
of data corruption (yes, timestamps do matter), I'm surprised nobody's
interested to help figure out what's going wrong. Hopefully it's something
about the way I've build out cluster (though it seems less and less likely
given we are able to replicate the problem so easily).
I can understand your frustration. I would be, also. However, given that I haven't heard of this problem before, I don't know how you were able to reproduce it. The best I can offer is that we'll investigate your bug report.
Thanks,
JM
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Todd Stansell
2013-03-06 17:26:17 UTC
Permalink
I filed the bug last night, fyi:

https://bugzilla.redhat.com/show_bug.cgi?id=918437

Todd
Post by John Mark Walker
Hi Todd,
A general note here: when someone posts a question and noone responds, it's generally because either no one has seen that particular behavior and they don't know how to respond, or they didn't understand what you were saying. In this case, I'd say it is the former.
----- Original Message -----
Post by Todd Stansell
something entirely different. We see the same behavior. After rebuilding the
first brick in the 2-brick replicate cluster, all file timestamps get updated
to the time self-heal copies the data back to that brick.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server
and timestamps get updated on all files. Can someone acknowledge that this
sounds like a bug? Does anyone care?
Please file a bug and include the relevant information at
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
- after searching for any similar bugs, of course.
Post by Todd Stansell
Being relatively new to glusterfs, it's painful to watch the mailing list and
even the IRC channel and see many folks ask questions with nothing but
silence. I honestly wasn't sure if glusterfs was actively being supported
??? Our IRC channel is one of the most active in the open source world. I'm honestly not sure what mailing lists or IRC channels you've been watching.
Post by Todd Stansell
anymore. Given the recent flurry of mail about lack of documentation I see
that's not really true. Unfortunately, given that what I'm seeing is a form
of data corruption (yes, timestamps do matter), I'm surprised nobody's
interested to help figure out what's going wrong. Hopefully it's something
about the way I've build out cluster (though it seems less and less likely
given we are able to replicate the problem so easily).
I can understand your frustration. I would be, also. However, given that I haven't heard of this problem before, I don't know how you were able to reproduce it. The best I can offer is that we'll investigate your bug report.
Thanks,
JM
Todd Stansell
2013-03-07 16:03:03 UTC
Permalink
This is in the bug, but thought folks on this thread might be interested in my
latest, simplified test results ... simulating brick1 going offline briefly.
This is a 3-brick replica all on a single host for easy testing.

1. kill the glusterfs process for brick1
2. update the file in /gv3
US-CA1 host13 (root) /:$ cp -p /etc/init.d/glusterfsd /gv3/glutserfs-heal
3. see the discrepancy:
US-CA1 host13 (root) /:$ ls -la --full-time /gfs0/brick?/g*
-rwxr-xr-x 2 root root 468 2013-03-07 15:49:19.845462000 +0000 /gfs0/brick1/glusterfs-heal
-rwxr-xr-x 2 root root 2019 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick2/glusterfs-heal
-rwxr-xr-x 2 root root 2019 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick3/glusterfs-heal
4. restart glusterd to bring brick1 back online
US-CA1 host13 (root) /:$ service glusterd stop
Stopping glusterd: [ OK ]
US-CA1 host13 (root) /:$ service glusterfsd stop
Stopping glusterfsd: [ OK ]
US-CA1 host13 (root) /:$ service glusterd start
Starting glusterd: [ OK ]
5. check out self-heal results:
US-CA1 host13 (root) /:$ ls -la --full-time /gfs0/brick?/g*
-rwxr-xr-x 2 root root 2019 2013-03-07 15:53:39.102271000 +0000 /gfs0/brick1/glusterfs-heal
-rwxr-xr-x 2 root root 2019 2013-03-07 15:53:39.102271000 +0000 /gfs0/brick2/glusterfs-heal
-rwxr-xr-x 2 root root 2019 2013-03-07 15:53:39.102271000 +0000 /gfs0/brick3/glusterfs-heal

It used the newly updated data (good), but updated all of the timestamps
(bad). Everything works fine as long as it's not brick1 that is offline.

Todd
Post by Todd Stansell
https://bugzilla.redhat.com/show_bug.cgi?id=918437
Todd
Post by John Mark Walker
Hi Todd,
A general note here: when someone posts a question and noone responds, it's generally because either no one has seen that particular behavior and they don't know how to respond, or they didn't understand what you were saying. In this case, I'd say it is the former.
----- Original Message -----
Post by Todd Stansell
something entirely different. We see the same behavior. After rebuilding the
first brick in the 2-brick replicate cluster, all file timestamps get updated
to the time self-heal copies the data back to that brick.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server
and timestamps get updated on all files. Can someone acknowledge that this
sounds like a bug? Does anyone care?
Please file a bug and include the relevant information at
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
- after searching for any similar bugs, of course.
Post by Todd Stansell
Being relatively new to glusterfs, it's painful to watch the mailing list and
even the IRC channel and see many folks ask questions with nothing but
silence. I honestly wasn't sure if glusterfs was actively being supported
??? Our IRC channel is one of the most active in the open source world. I'm honestly not sure what mailing lists or IRC channels you've been watching.
Post by Todd Stansell
anymore. Given the recent flurry of mail about lack of documentation I see
that's not really true. Unfortunately, given that what I'm seeing is a form
of data corruption (yes, timestamps do matter), I'm surprised nobody's
interested to help figure out what's going wrong. Hopefully it's something
about the way I've build out cluster (though it seems less and less likely
given we are able to replicate the problem so easily).
I can understand your frustration. I would be, also. However, given that I haven't heard of this problem before, I don't know how you were able to reproduce it. The best I can offer is that we'll investigate your bug report.
Thanks,
JM
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Continue reading on narkive:
Loading...