[Gluster-users] Reconstructing files from shards

Discussion:

Jamie Lawrence

2018-04-20 19:44:48 UTC

Hello,

So I have a volume on a gluster install (3.12.5) on which sharding was enabled at some point recently. (Don't know how it happened, it may have been an accidental run of an old script.) So it has been happily sharding behind our backs and it shouldn't have.

I'd like to turn sharding off and reverse the files back to normal. Some of these are sparse files, so I need to account for holes. There are more than enough that I need to write a tool to do it.

I saw notes ca. 3.7 saying the only way to do it was to read-off on the client-side, blow away the volume and start over. This would be extremely disruptive for us, and language I've seen reading tickets and old messages to this list make me think that isn't needed anymore, but confirmation of that would be good.

The only discussion I can find are these videos[1]: http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html , and some hints[2] that are old enough that I don't trust them without confirmation that nothing's changed. The video things don't acknowledge the existence of file holes. Also, the hint in [2] mentions using trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that value looks like base64, but when I attempt to decode it, base64 complains about invalid input.

In short, I can't find sufficient information to reconstruct these. Has anyone written a current, step-by-step guide on reconstructing sharded files? Or has someone has written a tool so I don't have to?

Thanks,

-j

[1] Why one would choose to annoy the crap out of their fellow gluster users by using video to convey about 80 bytes of ASCII-encoded information, I have no idea.
[2] http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html

Alessandro Briosi

2018-04-22 08:39:57 UTC

Permalink

Post by Jamie Lawrence
Hello,
So I have a volume on a gluster install (3.12.5) on which sharding was enabled at some point recently. (Don't know how it happened, it may have been an accidental run of an old script.) So it has been happily sharding behind our backs and it shouldn't have.
I'd like to turn sharding off and reverse the files back to normal. Some of these are sparse files, so I need to account for holes. There are more than enough that I need to write a tool to do it.
I saw notes ca. 3.7 saying the only way to do it was to read-off on the client-side, blow away the volume and start over. This would be extremely disruptive for us, and language I've seen reading tickets and old messages to this list make me think that isn't needed anymore, but confirmation of that would be good.
The only discussion I can find are these videos[1]:http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html , and some hints[2] that are old enough that I don't trust them without confirmation that nothing's changed. The video things don't acknowledge the existence of file holes. Also, the hint in [2] mentions using trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that value looks like base64, but when I attempt to decode it, base64 complains about invalid input.
In short, I can't find sufficient information to reconstruct these. Has anyone written a current, step-by-step guide on reconstructing sharded files? Or has someone has written a tool so I don't have to?

Imho the easiest path would be to turn off sharding on the volume and
simply do a copy of the files (to a different directory, or rename and
then copy i.e.)

This should simply store the files without sharding.

my 2 cents.

Alessandro

Gandalf Corvotempesta

2018-04-22 09:39:20 UTC

Permalink

Post by Alessandro Briosi
Imho the easiest path would be to turn off sharding on the volume and
simply do a copy of the files (to a different directory, or rename and
then copy i.e.)
This should simply store the files without sharding.

If you turn off sharding on a sharded volume with data in it, all sharded
files would be unreadable

Jim Kinney

2018-04-22 13:10:41 UTC

Permalink

So a stock ovirt with gluster install that uses sharding
A. Can't safely have sharding turned off once files are in use
B. Can't be expanded with additional bricks

Ouch.

Post by Gandalf Corvotempesta

Post by Alessandro Briosi
Imho the easiest path would be to turn off sharding on the volume and
simply do a copy of the files (to a different directory, or rename

and

Post by Alessandro Briosi
then copy i.e.)
This should simply store the files without sharding.

If you turn off sharding on a sharded volume with data in it, all sharded
files would be unreadable

--
Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.

Gandalf Corvotempesta

2018-04-23 08:11:35 UTC

Permalink

Post by Jim Kinney
So a stock ovirt with gluster install that uses sharding
A. Can't safely have sharding turned off once files are in use
B. Can't be expanded with additional bricks

If the expansion bug is still unresolved, yes :-)

Alessandro Briosi

2018-04-23 07:34:07 UTC

Permalink

Is it that really so?

I thought that sharding was a extended attribute on the files created
when sharding is enabled.

Turning off sharding on the volume would not turn off sharding on the
files, but on newly created files ...

Anyway if that is so the simplest path would be to crete a new volume
and move/copy files over.

Alessandro

Gandalf Corvotempesta

2018-04-23 08:11:06 UTC

Permalink

Post by Alessandro Briosi
Is it that really so?

yes, i've opened a bug asking developers to block removal of sharding
when volume has data on it or to write a huge warning message
saying that data loss will happen

Post by Alessandro Briosi
I thought that sharding was a extended attribute on the files created when
sharding is enabled.
Turning off sharding on the volume would not turn off sharding on the files,
but on newly created files ...

No, because sharded file are reconstructed on-the-fly based on the
volume's sharding property.
If you disable sharding, gluster knows nothing about the previous
shard configuration, thus won't be able to read
all shards for each file. It will only returns the first shard,
resulting in data-loss or corruption.

Jim Kinney

2018-04-23 11:37:35 UTC

Permalink

Are there any plans to create an unsharding tool?

Post by Gandalf Corvotempesta

Post by Alessandro Briosi
Is it that really so?

yes, i've opened a bug asking developers to block removal of sharding
when volume has data on it or to write a huge warning message
saying that data loss will happen

Post by Alessandro Briosi
I thought that sharding was a extended attribute on the files created

when

Post by Alessandro Briosi
sharding is enabled.
Turning off sharding on the volume would not turn off sharding on the

files,

Post by Alessandro Briosi
but on newly created files ...

--
Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.

2018-04-23 17:49:31 UTC

Permalink

From some old May 2017 email. I asked the following:

"From the docs, I see you can identify the shards by the GFID

# getfattr -d -m. -e hex/path_to_file/

# ls /bricks/*/.shard -lh | grep /GFID

Is there a gluster tool/script that will recreate the file?

or can you just sort them sort them properly and then simply cat/copy+
them back together?

cat shardGFID.1 .. shardGFID.X > thefile "

/
The response from RedHat was:

"Yes, this should work, but you would need to include the base file (the
0th shard, if you will) first in the list of files that you're stitching
up.Â In the happy case, you can test it by comparing the md5sum of the
file from the mount to that of your stitched file."

We tested it with some VM files and it indeed worked fine. That was
probably on 3.10.1 at the time.

-wk

Post by Jamie Lawrence
Hello,
So I have a volume on a gluster install (3.12.5) on which sharding was enabled at some point recently. (Don't know how it happened, it may have been an accidental run of an old script.) So it has been happily sharding behind our backs and it shouldn't have.
I'd like to turn sharding off and reverse the files back to normal. Some of these are sparse files, so I need to account for holes. There are more than enough that I need to write a tool to do it.
I saw notes ca. 3.7 saying the only way to do it was to read-off on the client-side, blow away the volume and start over. This would be extremely disruptive for us, and language I've seen reading tickets and old messages to this list make me think that isn't needed anymore, but confirmation of that would be good.
The only discussion I can find are these videos[1]: http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html , and some hints[2] that are old enough that I don't trust them without confirmation that nothing's changed. The video things don't acknowledge the existence of file holes. Also, the hint in [2] mentions using trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that value looks like base64, but when I attempt to decode it, base64 complains about invalid input.
In short, I can't find sufficient information to reconstruct these. Has anyone written a current, step-by-step guide on reconstructing sharded files? Or has someone has written a tool so I don't have to?
Thanks,
-j
[1] Why one would choose to annoy the crap out of their fellow gluster users by using video to convey about 80 bytes of ASCII-encoded information, I have no idea.
[2] http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Jamie Lawrence

2018-04-23 18:46:14 UTC

Permalink

Post by WK
"From the docs, I see you can identify the shards by the GFID
# getfattr -d -m. -e hex path_to_file
# ls /bricks/*/.shard -lh | grep GFID
Is there a gluster tool/script that will recreate the file?
or can you just sort them sort them properly and then simply cat/copy+ them back together?
cat shardGFID.1 .. shardGFID.X > thefile "
"Yes, this should work, but you would need to include the base file (the 0th shard, if you will) first in the list of files that you're stitching up. In the happy case, you can test it by comparing the md5sum of the file from the mount to that of your stitched file."
We tested it with some VM files and it indeed worked fine. That was probably on 3.10.1 at the time.

Thanks for that, WK.

Do you know if those images were sparse files? My understanding is that this will not work with files with holes.

Quoting from : http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html

- - snip

1. A non-existent/missing shard anywhere between offset $SHARD_BLOCK_SIZE
through ceiling ($FILE_SIZE/$SHARD_BLOCK_SIZE)
indicates a hole. When you reconstruct data from a sharded file of this
nature, you need to take care to retain this property.

2. The above is also true for partially filled shards between offset
$SHARD_BLOCK_SIZE through ceiling ($FILE_SIZE/$SHARD_BLOCK_SIZE).
What do I mean by partially filled shards? Shards whose sizes are not equal
to $SHARD_BLOCK_SIZE.

In the above, $FILE_SIZE can be gotten from the
'trusted.glusterfs.shard.file-size' extended attribute on the base file
(the 0th block).

- - snip

So it sounds like (although I am not sure, which is why I was writing in the first place) one would need to use `dd` or similar to read out ( ${trusted.glusterfs.shard.file-size} - ($SHARD_BLOCK_SIZE * count) ) bytes from the partial shard.

Although I also just realized the above quote fails to explain, if a file has a hole less than $SHARD_BLOCK_SIZE in size, how we know which shard(s) are holey, so I'm back to thinking reconstruction is undocumented and unsupported except for reading the files off on a client, blowing away the volume and reconstructing. Which is a problem.

-j

Post by WK
-wk

Post by Jamie Lawrence
Hello,
So I have a volume on a gluster install (3.12.5) on which sharding was enabled at some point recently. (Don't know how it happened, it may have been an accidental run of an old script.) So it has been happily sharding behind our backs and it shouldn't have.
I'd like to turn sharding off and reverse the files back to normal. Some of these are sparse files, so I need to account for holes. There are more than enough that I need to write a tool to do it.
I saw notes ca. 3.7 saying the only way to do it was to read-off on the client-side, blow away the volume and start over. This would be extremely disruptive for us, and language I've seen reading tickets and old messages to this list make me think that isn't needed anymore, but confirmation of that would be good.
http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html
, and some hints[2] that are old enough that I don't trust them without confirmation that nothing's changed. The video things don't acknowledge the existence of file holes. Also, the hint in [2] mentions using trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that value looks like base64, but when I attempt to decode it, base64 complains about invalid input.
In short, I can't find sufficient information to reconstruct these. Has anyone written a current, step-by-step guide on reconstructing sharded files? Or has someone has written a tool so I don't have to?
Thanks,
-j
[1] Why one would choose to annoy the crap out of their fellow gluster users by using video to convey about 80 bytes of ASCII-encoded information, I have no idea.
[2]
http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html

2018-04-23 19:57:23 UTC

Permalink

Post by Jamie Lawrence
Thanks for that, WK.
Do you know if those images were sparse files? My understanding is that this will not work with files with holes.

We typically use qcow2 images (compat 1.1) with metadata preallocated
(so yes, sparse) So we may be vulnerable to that to whole problem. Our
Gluster setups are always simple replication either rep3 or rep2+arb.

Since you brought it up, I recall being somewhat aware of the 'sparse'
issue at the time.

Given the way qcow2 images expand and/or the fact that the images are
typically fully grown after being in production for a while may have
given us a false positive by passing the md5 check. We literally yanked
a live gluster brick node full of mature VM images and used that to do
an offline reconstruction of the shard images. All passed the md5sum
and/or booted cleanly. We did have one or two that the md5sum was off
but the file was otherwise fine and passed an fsck.

Upon reflection that that may been the sparse file issue trying to get
our attention.

In the end, we have always been able to mount the volume (perhaps by
killing quorum if you are down to a single brick node) and copy the VM
files cleanly, so the 'sparse may be a problem' issue was soon forgotten
as mounting is easier and cleaner than shard reconstruction.

I'm glad you brought the issue forward again because "the conventional
wisdom" here is that shard reconstruction worked fine if we ever really
needed it <grin>.

I suppose reallocation=full or reallocation=falloc would be a logical
replacement to cover that issue, but we would have to think through the
cost/benefit of fully allocated images just because there is a very
slight chance we would have to reconstruct the shards instead of simply
copying off the image from a mount.

If I can ever find some spare time, I would like to redo the test with
brand new qcow2 files that are preallocated with lots of room to grow.

-wk

Post by Jamie Lawrence
Quoting from : http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html
- - snip
1. A non-existent/missing shard anywhere between offset $SHARD_BLOCK_SIZE
through ceiling ($FILE_SIZE/$SHARD_BLOCK_SIZE)
indicates a hole. When you reconstruct data from a sharded file of this
nature, you need to take care to retain this property.
2. The above is also true for partially filled shards between offset
$SHARD_BLOCK_SIZE through ceiling ($FILE_SIZE/$SHARD_BLOCK_SIZE).
What do I mean by partially filled shards? Shards whose sizes are not equal
to $SHARD_BLOCK_SIZE.
In the above, $FILE_SIZE can be gotten from the
'trusted.glusterfs.shard.file-size' extended attribute on the base file
(the 0th block).
- - snip
So it sounds like (although I am not sure, which is why I was writing in the first place) one would need to use `dd` or similar to read out ( ${trusted.glusterfs.shard.file-size} - ($SHARD_BLOCK_SIZE * count) ) bytes from the partial shard.
Although I also just realized the above quote fails to explain, if a file has a hole less than $SHARD_BLOCK_SIZE in size, how we know which shard(s) are holey, so I'm back to thinking reconstruction is undocumented and unsupported except for reading the files off on a client, blowing away the volume and reconstructing. Which is a problem.
-j

-wk

Post by Jamie Lawrence
Hello,
So I have a volume on a gluster install (3.12.5) on which sharding was enabled at some point recently. (Don't know how it happened, it may have been an accidental run of an old script.) So it has been happily sharding behind our backs and it shouldn't have.
I'd like to turn sharding off and reverse the files back to normal. Some of these are sparse files, so I need to account for holes. There are more than enough that I need to write a tool to do it.
I saw notes ca. 3.7 saying the only way to do it was to read-off on the client-side, blow away the volume and start over. This would be extremely disruptive for us, and language I've seen reading tickets and old messages to this list make me think that isn't needed anymore, but confirmation of that would be good.
http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html
, and some hints[2] that are old enough that I don't trust them without confirmation that nothing's changed. The video things don't acknowledge the existence of file holes. Also, the hint in [2] mentions using trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that value looks like base64, but when I attempt to decode it, base64 complains about invalid input.
In short, I can't find sufficient information to reconstruct these. Has anyone written a current, step-by-step guide on reconstructing sharded files? Or has someone has written a tool so I don't have to?
Thanks,
-j
[1] Why one would choose to annoy the crap out of their fellow gluster users by using video to convey about 80 bytes of ASCII-encoded information, I have no idea.
[2]
http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html

Krutika Dhananjay

2018-04-27 04:00:15 UTC

Permalink

The short answer is - no there exists no script currently that can piece
the shards together into a single file.

Long answer:
IMO the safest way to convert from sharded to a single file _is_ by copying
the data out into a new volume at the moment.
Picking up the files from the individual bricks directly and joining them,
although fast, is a strict no-no for many reasons - for example, when you
have a replicated volume
and the good copy needs to be carefully selected and must remain a good
copy through the course of the copying process. There could be other
consistency issues with
file attributes changing while they are being copied. All of this is not
possible, unless you're open to taking the volume down.

Then the other option is to have gluster client (perhaps in the shard
translator itself)) do the conversion in the background within the gluster
translator stack, which is safer
but would require that shard lock it until the copying is complete. And
until then no IO can happen into this file.
(I haven't found the time to work on this, as there exists a workaround and
I've been busy with other tasks. If anyone wants to volunteer to get this
done, I'll be happy to help).

But anway, why is copying data into new unsharded volume disruptive for you?

-Krutika

Post by Jamie Lawrence
Hello,
So I have a volume on a gluster install (3.12.5) on which sharding was
enabled at some point recently. (Don't know how it happened, it may have
been an accidental run of an old script.) So it has been happily sharding
behind our backs and it shouldn't have.
I'd like to turn sharding off and reverse the files back to normal. Some
of these are sparse files, so I need to account for holes. There are more
than enough that I need to write a tool to do it.
I saw notes ca. 3.7 saying the only way to do it was to read-off on the
client-side, blow away the volume and start over. This would be extremely
disruptive for us, and language I've seen reading tickets and old messages
to this list make me think that isn't needed anymore, but confirmation of
that would be good.
http://opensource-storage.blogspot.com/2016/07/de-
mystifying-gluster-shards.html , and some hints[2] that are old enough
that I don't trust them without confirmation that nothing's changed. The
video things don't acknowledge the existence of file holes. Also, the hint
in [2] mentions using trusted.glusterfs.shard.file-size to get the size
of a partly filled hole; that value looks like base64, but when I attempt
to decode it, base64 complains about invalid input.
In short, I can't find sufficient information to reconstruct these. Has
anyone written a current, step-by-step guide on reconstructing sharded
files? Or has someone has written a tool so I don't have to?
Thanks,
-j
[1] Why one would choose to annoy the crap out of their fellow gluster
users by using video to convey about 80 bytes of ASCII-encoded information,
I have no idea.
[2] http://lists.gluster.org/pipermail/gluster-devel/2017-
March/052212.html
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Jim Kinney

2018-04-27 11:45:09 UTC

Permalink

For me, the process of copying out the drive file from Ovirt is a tedious, very manual process. Each vm has a single drive file with tens of thousands of shards each. Typical vm size is 100G for me. And it's all mostly sparse. So, yes, a copy out from the gluster share is best.

Did the outstanding bug of adding bricks to sharded domain causing data loss get fixed in release 3.12?

Post by Krutika Dhananjay
The short answer is - no there exists no script currently that can piece
the shards together into a single file.
IMO the safest way to convert from sharded to a single file _is_ by copying
the data out into a new volume at the moment.
Picking up the files from the individual bricks directly and joining them,
although fast, is a strict no-no for many reasons - for example, when you
have a replicated volume
and the good copy needs to be carefully selected and must remain a good
copy through the course of the copying process. There could be other
consistency issues with
file attributes changing while they are being copied. All of this is not
possible, unless you're open to taking the volume down.
Then the other option is to have gluster client (perhaps in the shard
translator itself)) do the conversion in the background within the gluster
translator stack, which is safer
but would require that shard lock it until the copying is complete. And
until then no IO can happen into this file.
(I haven't found the time to work on this, as there exists a workaround and
I've been busy with other tasks. If anyone wants to volunteer to get this
done, I'll be happy to help).
But anway, why is copying data into new unsharded volume disruptive for you?
-Krutika
On Sat, Apr 21, 2018 at 1:14 AM, Jamie Lawrence

Post by Jamie Lawrence
Hello,
So I have a volume on a gluster install (3.12.5) on which sharding

was

Post by Jamie Lawrence
enabled at some point recently. (Don't know how it happened, it may

have

Post by Jamie Lawrence
been an accidental run of an old script.) So it has been happily

sharding

Post by Jamie Lawrence
behind our backs and it shouldn't have.
I'd like to turn sharding off and reverse the files back to normal.

Some

Post by Jamie Lawrence
of these are sparse files, so I need to account for holes. There are

Post by Jamie Lawrence
than enough that I need to write a tool to do it.
I saw notes ca. 3.7 saying the only way to do it was to read-off on

the

Post by Jamie Lawrence
client-side, blow away the volume and start over. This would be

extremely

Post by Jamie Lawrence
disruptive for us, and language I've seen reading tickets and old

messages

Post by Jamie Lawrence
to this list make me think that isn't needed anymore, but

confirmation of

Post by Jamie Lawrence
that would be good.
http://opensource-storage.blogspot.com/2016/07/de-
mystifying-gluster-shards.html , and some hints[2] that are old

enough

Post by Jamie Lawrence
that I don't trust them without confirmation that nothing's changed.

The

Post by Jamie Lawrence
video things don't acknowledge the existence of file holes. Also, the

hint

Post by Jamie Lawrence
in [2] mentions using trusted.glusterfs.shard.file-size to get the

size

Post by Jamie Lawrence
of a partly filled hole; that value looks like base64, but when I

attempt

Post by Jamie Lawrence
to decode it, base64 complains about invalid input.
In short, I can't find sufficient information to reconstruct these.

Has

Post by Jamie Lawrence
anyone written a current, step-by-step guide on reconstructing

sharded

Post by Jamie Lawrence
files? Or has someone has written a tool so I don't have to?
Thanks,
-j
[1] Why one would choose to annoy the crap out of their fellow

gluster

Post by Jamie Lawrence
users by using video to convey about 80 bytes of ASCII-encoded

information,

Post by Jamie Lawrence
I have no idea.
[2] http://lists.gluster.org/pipermail/gluster-devel/2017-
March/052212.html
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.

Jamie Lawrence

2018-04-27 22:34:12 UTC

Permalink

Post by Krutika Dhananjay
But anway, why is copying data into new unsharded volume disruptive for you?

The copy itself isn't; blowing away the existing volume and recreating it is.

That is for the usual reasons - storage on the cluster machines is not infinite, the cluster serves a purpose that humans rely on, downtime is expensive.

-j