Discussion:
Adding storage to an existing server.
(too old to reply)
Todd Green
2010-04-05 15:44:55 UTC
Permalink
I am testing Gluster Storage Platform in a fully virtualized CentOS 5.4 guest.š I assigned two drives to the guest one 10GB for the gluster install and one 100GB to use as storage.š Although I can see the status of the second volume (drive) in the Resource Monitor I am unable to format it and use it to create volumes.



This has occurred with the initial server and a storage server.



Is this possible to do or does all the presented storage have to appear in the first storage device?



________________________________

Todd A. Green  Systems Engineer  3400 Players Club Parkway Suite 300 Memphis, TN 38125  901.271.1863 Direct  901.755.0110 Main| 901. 339.7977 Fax  ***@irondata.com <imap://***@mail.accuship.com:143/***@irondata.com>
Bala.JA
2010-04-05 16:13:51 UTC
Permalink
Hi,

Format disk functionality is available in Server Manager. Please check it. If
100GB disk was added after Gluster Platform installation, its required to
partition the disk by hand (Its a bug in 3.0.X releases). In system console,
you can run

parted -s <disk> mklabel gpt
parted -s <disk> mkpart primary ext3 0 100%

This will make appear the disk in Server Manager.

Thanks,

Regards,
Bala
Post by Todd Green
I am testing Gluster Storage Platform in a fully virtualized CentOS 5.4
guest. I assigned two drives to the guest one 10GB for the gluster install
and one 100GB to use as storage. Although I can see the status of the second
volume (drive) in the Resource Monitor I am unable to format it and use it to
create volumes.
This has occurred with the initial server and a storage server.
Is this possible to do or does all the presented storage have to appear in
the first storage device?
________________________________
Todd A. Green │ Systems Engineer │ 3400 Players Club Parkway Suite 300
Memphis, TN 38125 │ 901.271.1863 Direct │ 901.755.0110 Main| 901. 339.7977
------------------------------------------------------------------------
_______________________________________________ Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Kali Hernandez
2010-04-06 04:07:53 UTC
Permalink
Hi all,

We are running glusterfs 3.0.3, installed from RHEL rpm's, over 30 nodes
(not virtual machines). Our config pairs each 2 machines under replicate
translator as mirrors, and over that aggregates the 15 resulting mirrors
under stripe translator. Before we were using distribute instead, but we
had the same problem.

We are copying (using cp) a lot of files which reside under the same
directory, and I have been monitoring the whole copy process to check
where the failure starts.

In the middle of the copy process we get this error:

cp: cannot create regular file
`/mnt/gluster_new/videos/1251512-3CA86758640A31E7770EBC7629AEC10F.mpg':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/1758650-3AF69C6B7FDAC0A40D85EABA8C85490D.mswmm': No
space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/179183-A018B5FBE6DCCF04A3BB99C814CD9EAB.wmv':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/2448602-568B1ACF53675DC762485F2B26539E0D.wmv':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/626249-7B7FFFE0B9C56E9BE5733409CB73BCDF_300.jpg':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/1962299-B7CDFF12FB1AD41DF3660BF0C7045CBC.avi':
No space left on device

(hundreds of times)

When I look at the storage distribution, I can see this:

node 10 37G 14G 23G 38% /glusterfs_storage
node 11 37G 14G 23G 37% /glusterfs_storage
node 12 37G 14G 23G 37% /glusterfs_storage
node 13 37G 14G 23G 37% /glusterfs_storage
node 14 37G 13G 24G 36% /glusterfs_storage
node 15 37G 13G 24G 36% /glusterfs_storage
node 16 37G 13G 24G 35% /glusterfs_storage
node 17 49G 12G 36G 26% /glusterfs_storage
node 18 37G 12G 25G 33% /glusterfs_storage
node 19 37G 12G 25G 33% /glusterfs_storage
node 20 37G 14G 23G 38% /glusterfs_storage
node 21 37G 14G 23G 37% /glusterfs_storage
node 22 37G 14G 23G 37% /glusterfs_storage
node 23 37G 14G 23G 37% /glusterfs_storage
node 24 37G 13G 24G 36% /glusterfs_storage
node 25 37G 13G 24G 36% /glusterfs_storage
node 26 37G 13G 24G 35% /glusterfs_storage
node 27 49G 12G 36G 26% /glusterfs_storage
node 28 37G 12G 25G 33% /glusterfs_storage
node 29 37G 12G 25G 33% /glusterfs_storage
node 35 40G 40G 0 100% /glusterfs_storage
node 36 40G 22G 18G 56% /glusterfs_storage
node 37 40G 18G 22G 45% /glusterfs_storage
node 38 40G 16G 24G 40% /glusterfs_storage
node 39 40G 15G 25G 37% /glusterfs_storage
node 45 40G 40G 0 100% /glusterfs_storage
node 46 40G 22G 18G 56% /glusterfs_storage
node 47 40G 18G 22G 45% /glusterfs_storage
node 48 40G 16G 24G 40% /glusterfs_storage
node 49 40G 15G 25G 37% /glusterfs_storage

(node mirror pairings are 10-19 paired to 20-29, and 35-39 to 45-49)


As you can see, distribution of space over the cluster is more or less
rational over most of the nodes, except for node pair 35/45, which run
out of space. Thus, every time I try to copy more data onto the cluster,
I run into the mentioned "no space left on device"
Kali Hernandez
2010-04-06 05:56:50 UTC
Permalink
Hi,

In this same environment, when I try to create a new directory on the
mount point (client side), I get this error:

profile3:/mnt # mkdir gluster_new/newdir
mkdir: cannot create directory `gluster_new/newdir': Software caused
connection abort
profile3:/mnt # mkdir gluster_new/newdir
mkdir: cannot create directory `gluster_new/newdir': Transport endpoint
is not connected
profile3:/mnt # mount


If I check the log file, I can see:

[2010-04-06 07:58:26] W [fuse-bridge.c:477:fuse_entry_cbk]
glusterfs-fuse: 4373613: MKDIR() /newdir returning inode 0
pending frames:
frame : type(1) op(MKDIR)
frame : type(1) op(MKDIR)

patchset: v3.0.2-41-g029062c
signal received: 11
time of crash: 2010-04-06 07:58:26
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.3
/lib64/libc.so.6[0x7f49e1c7f6e0]
/usr/lib64/libglusterfs.so.0(inode_link+0x23)[0x7f49e23e73b3]
/usr/lib64/glusterfs/3.0.3/xlator/mount/fuse.so[0x7f49e07b8a43]
/usr/lib64/glusterfs/3.0.3/xlator/mount/fuse.so[0x7f49e07b8f92]
/usr/lib64/libglusterfs.so.0[0x7f49e23e0cd5]
/usr/lib64/libglusterfs.so.0[0x7f49e23e0cd5]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/stripe.so(stripe_stack_unwind_inode_cbk+0x1aa)[0x7f49e0de19ba]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_unwind+0x113)[0x7f49e0ffa4c3]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0xbe)[0x7f49e0ffb1de]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_mkdir_cbk+0x405)[0x7f49e1242d35]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7f49e123024a]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(notify+0x212)[0x7f49e12376c2]
/usr/lib64/libglusterfs.so.0(xlator_notify+0x43)[0x7f49e23d93e3]
/usr/lib64/glusterfs/3.0.3/transport/socket.so(socket_event_handler+0xd3)[0x7f49dfda6173]
/usr/lib64/libglusterfs.so.0[0x7f49e23f3045]
/usr/sbin/glusterfs(main+0xa28)[0x404268]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f49e1c6b586]
/usr/sbin/glusterfs[0x402749]
---------


Again, I am totally clueless...
Post by Kali Hernandez
Hi all,
We are running glusterfs 3.0.3, installed from RHEL rpm's, over 30
nodes (not virtual machines). Our config pairs each 2 machines under
replicate translator as mirrors, and over that aggregates the 15
resulting mirrors under stripe translator. Before we were using
distribute instead, but we had the same problem.
We are copying (using cp) a lot of files which reside under the same
directory, and I have been monitoring the whole copy process to check
where the failure starts.
cp: cannot create regular file
`/mnt/gluster_new/videos/1251512-3CA86758640A31E7770EBC7629AEC10F.mpg': No
space left on device
cp: cannot create regular file
No space left on device
cp: cannot create regular file
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/2448602-568B1ACF53675DC762485F2B26539E0D.wmv': No
space left on device
cp: cannot create regular file
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/1962299-B7CDFF12FB1AD41DF3660BF0C7045CBC.avi': No
space left on device
(hundreds of times)
node 10 37G 14G 23G 38% /glusterfs_storage
node 11 37G 14G 23G 37% /glusterfs_storage
node 12 37G 14G 23G 37% /glusterfs_storage
node 13 37G 14G 23G 37% /glusterfs_storage
node 14 37G 13G 24G 36% /glusterfs_storage
node 15 37G 13G 24G 36% /glusterfs_storage
node 16 37G 13G 24G 35% /glusterfs_storage
node 17 49G 12G 36G 26% /glusterfs_storage
node 18 37G 12G 25G 33% /glusterfs_storage
node 19 37G 12G 25G 33% /glusterfs_storage
node 20 37G 14G 23G 38% /glusterfs_storage
node 21 37G 14G 23G 37% /glusterfs_storage
node 22 37G 14G 23G 37% /glusterfs_storage
node 23 37G 14G 23G 37% /glusterfs_storage
node 24 37G 13G 24G 36% /glusterfs_storage
node 25 37G 13G 24G 36% /glusterfs_storage
node 26 37G 13G 24G 35% /glusterfs_storage
node 27 49G 12G 36G 26% /glusterfs_storage
node 28 37G 12G 25G 33% /glusterfs_storage
node 29 37G 12G 25G 33% /glusterfs_storage
node 35 40G 40G 0 100% /glusterfs_storage
node 36 40G 22G 18G 56% /glusterfs_storage
node 37 40G 18G 22G 45% /glusterfs_storage
node 38 40G 16G 24G 40% /glusterfs_storage
node 39 40G 15G 25G 37% /glusterfs_storage
node 45 40G 40G 0 100% /glusterfs_storage
node 46 40G 22G 18G 56% /glusterfs_storage
node 47 40G 18G 22G 45% /glusterfs_storage
node 48 40G 16G 24G 40% /glusterfs_storage
node 49 40G 15G 25G 37% /glusterfs_storage
(node mirror pairings are 10-19 paired to 20-29, and 35-39 to 45-49)
As you can see, distribution of space over the cluster is more or less
rational over most of the nodes, except for node pair 35/45, which run
out of space. Thus, every time I try to copy more data onto the
cluster, I run into the mentioned "no space left on device"
Amar Tumballi
2010-04-06 06:32:11 UTC
Permalink
Post by Kali Hernandez
/usr/lib64/glusterfs/3.0.3/xlator/cluster/stripe.so(stripe_stack_unwind_inode_cbk+0x1aa)[0x7f49e0de19ba]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_unwind+0x113)[0x7f49e0ffa4c3]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0xbe)[0x7f49e0ffb1de]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_mkdir_cbk+0x405)[0x7f49e1242d35]
I can see that you have used 'cluster/replicate' as subvolume of
'cluster/stripe' in the volume file. This setup is not supported with
GlusterFS as of now. (Both RAID 10, or RAID 01 are not supported).

Please use volume files generated by 'glusterfs-volgen' tool.

Regards,
Amar
Kali Hernandez
2010-04-06 06:54:42 UTC
Permalink
Post by Kali Hernandez
/usr/lib64/glusterfs/3.0.3/xlator/cluster/stripe.so(stripe_stack_unwind_inode_cbk+0x1aa)[0x7f49e0de19ba]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_unwind+0x113)[0x7f49e0ffa4c3]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0xbe)[0x7f49e0ffb1de]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_mkdir_cbk+0x405)[0x7f49e1242d35]
I can see that you have used 'cluster/replicate' as subvolume of
'cluster/stripe' in the volume file. This setup is not supported with
GlusterFS as of now. (Both RAID 10, or RAID 01 are not supported).
Please use volume files generated by 'glusterfs-volgen' tool.
Regards,
Amar
Well, to me it _does_ work, meaning I can see the paired and mirrored
nodes are having all the same copies of the files, and the same disk
usage. If one node goes down for some reason I can recover the info from
the mirrored pair.

The reason for this error seems to be related to what Krzysztof replied
in this same thread.

-Kali-
Brandon Ooi
2010-04-06 07:22:01 UTC
Permalink
Actually this is my biggest qualm about Gluster and is both it's greatest
weakness and greatest strength. It's great as cluster storage where storage
requirements don't change, but doesn't work well as just general storage. In
real life usage, storage requirements change and it's necessary to
add/remove heterogeneous storage nodes depending on business needs. There
currently isn't a working configuration of Gluster for this after the unify
translator was removed. The unify translator isn't scalable at the high-end
anyway.

It may just be a general design decision that prevents this because there's
no master node that can manage replication/rebalancing. If anybody knows a
good open-source solution please pipe up!

Brandon
Post by Kali Hernandez
/usr/lib64/glusterfs/3.0.3/xlator/cluster/stripe.so(stripe_stack_unwind_inode_cbk+0x1aa)[0x7f49e0de19ba]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_unwind+0x113)[0x7f49e0ffa4c3]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0xbe)[0x7f49e0ffb1de]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_mkdir_cbk+0x405)[0x7f49e1242d35]
I can see that you have used 'cluster/replicate' as subvolume of
'cluster/stripe' in the volume file. This setup is not supported with
GlusterFS as of now. (Both RAID 10, or RAID 01 are not supported).
Please use volume files generated by 'glusterfs-volgen' tool.
Regards,
Amar
Well, to me it _does_ work, meaning I can see the paired and mirrored nodes
are having all the same copies of the files, and the same disk usage. If one
node goes down for some reason I can recover the info from the mirrored
pair.
The reason for this error seems to be related to what Krzysztof replied in
this same thread.
-Kali-
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Brandon Ooi
2010-04-06 07:19:48 UTC
Permalink
Actually this is my biggest qualm about Gluster and is both it's greatest
weakness and greatest strength. It's great as cluster storage where storage
requirements don't change, but doesn't work well as just general storage. In
real life usage, storage requirements change and it's necessary to
add/remove heterogeneous storage nodes depending on business needs. There
currently isn't a working configuration of Gluster for this after the unify
translator was removed. The unify translator isn't scalable at the high-end
anyway.

It may just be a general design decision that prevents this because there's
no master node that can manage replication/rebalancing. If anybody knows a
good open-source solution please pipe up!

Brandon
Post by Kali Hernandez
/usr/lib64/glusterfs/3.0.3/xlator/cluster/stripe.so(stripe_stack_unwind_inode_cbk+0x1aa)[0x7f49e0de19ba]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_unwind+0x113)[0x7f49e0ffa4c3]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0xbe)[0x7f49e0ffb1de]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_mkdir_cbk+0x405)[0x7f49e1242d35]
I can see that you have used 'cluster/replicate' as subvolume of
'cluster/stripe' in the volume file. This setup is not supported with
GlusterFS as of now. (Both RAID 10, or RAID 01 are not supported).
Please use volume files generated by 'glusterfs-volgen' tool.
Regards,
Amar
Well, to me it _does_ work, meaning I can see the paired and mirrored nodes
are having all the same copies of the files, and the same disk usage. If one
node goes down for some reason I can recover the info from the mirrored
pair.
The reason for this error seems to be related to what Krzysztof replied in
this same thread.
-Kali-
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Krzysztof Strasburger
2010-04-06 06:36:36 UTC
Permalink
Post by Kali Hernandez
cp: cannot create regular file
No space left on device
[...]
Post by Kali Hernandez
node 10 37G 14G 23G 38% /glusterfs_storage
node 11 37G 14G 23G 37% /glusterfs_storage
node 12 37G 14G 23G 37% /glusterfs_storage
node 13 37G 14G 23G 37% /glusterfs_storage
node 14 37G 13G 24G 36% /glusterfs_storage
node 15 37G 13G 24G 36% /glusterfs_storage
node 16 37G 13G 24G 35% /glusterfs_storage
node 17 49G 12G 36G 26% /glusterfs_storage
node 18 37G 12G 25G 33% /glusterfs_storage
node 19 37G 12G 25G 33% /glusterfs_storage
node 20 37G 14G 23G 38% /glusterfs_storage
node 21 37G 14G 23G 37% /glusterfs_storage
node 22 37G 14G 23G 37% /glusterfs_storage
node 23 37G 14G 23G 37% /glusterfs_storage
node 24 37G 13G 24G 36% /glusterfs_storage
node 25 37G 13G 24G 36% /glusterfs_storage
node 26 37G 13G 24G 35% /glusterfs_storage
node 27 49G 12G 36G 26% /glusterfs_storage
node 28 37G 12G 25G 33% /glusterfs_storage
node 29 37G 12G 25G 33% /glusterfs_storage
node 35 40G 40G 0 100% /glusterfs_storage
node 36 40G 22G 18G 56% /glusterfs_storage
node 37 40G 18G 22G 45% /glusterfs_storage
node 38 40G 16G 24G 40% /glusterfs_storage
node 39 40G 15G 25G 37% /glusterfs_storage
node 45 40G 40G 0 100% /glusterfs_storage
node 46 40G 22G 18G 56% /glusterfs_storage
node 47 40G 18G 22G 45% /glusterfs_storage
node 48 40G 16G 24G 40% /glusterfs_storage
node 49 40G 15G 25G 37% /glusterfs_storage
(node mirror pairings are 10-19 paired to 20-29, and 35-39 to 45-49)
[...]
Post by Kali Hernandez
So basically, I get out of space messages when there is around 340 Gb
free on the cluster.
I tried using distribute translator instead of stripe, in fact that was
our first setup, but we thought maybe we are starting to copy a big file
(usually we store really big .tar.gz backups here) and it runs out of
space in the meanwhile, so we thought about using stripe, because
theoretically glusterfs would in that case move and copy the next block
of the file into another node. But in both cases (distribute and stripe)
we run into the same problems.
So I am wondering if this is a problem of a maximum number of files in a
same directory or filesystem or what?
Any ideas on this issue?
As you see, nodes 35 and 45 are full. Go back to 2.0.9 and use the unify
translator with load balancing.
Stripe needs free space on each subvolume. DHT (distribute) has the weak
point that it may decide to put a file on a full subvolume, because of
the filename's hash function value. Unify was much better in such situations,
but unfortunately it is no longer supported in 3.x. You may find it under the
"legacy" directory tree and it even compiles, but does not work.
Krzysztof
Kali Hernandez
2010-04-06 06:51:49 UTC
Permalink
Post by Krzysztof Strasburger
As you see, nodes 35 and 45 are full. Go back to 2.0.9 and use the unify
translator with load balancing.
Stripe needs free space on each subvolume. DHT (distribute) has the weak
point that it may decide to put a file on a full subvolume, because of
the filename's hash function value. Unify was much better in such situations,
but unfortunately it is no longer supported in 3.x. You may find it under the
"legacy" directory tree and it even compiles, but does not work.
Krzysztof
So basically this means no solution is really good as for glusterfs 3.0?

I mean, whichever translator I use, I will eventually run into this
situation where I will try to write a file and gluster will report "no
free space" while there _is_ free space, either because it needs to
write an entry for each file in every node, or because the hash function
used to decide the node is not able to fall back to another node when
this one is full?

To me, this makes the whole glusterfs totally useless, as I will
randomly get no free space errors even if there is space, so what's the
point on it?

Does the Unify translator work properly in 2.0.x?


Thank you very much for your detailed explanation.

-kali-
Krzysztof Strasburger
2010-04-06 08:32:32 UTC
Permalink
Post by Kali Hernandez
So basically this means no solution is really good as for glusterfs 3.0?
As for now, there is (probably) not. IMHO it would be useful to add an
option to DHT, to use load balancing approach instead of hash function.
Combined with no-hashed-lookup, this would effectively restore the
functionality of unify, at a cost of stat'ing each filesystem before file
creation. I understand that this approach does not scale, but the additional
cost is acceptable for a small number of subvolumes.
Post by Kali Hernandez
To me, this makes the whole glusterfs totally useless, as I will
randomly get no free space errors even if there is space, so what's the
point on it?
Does the Unify translator work properly in 2.0.x?
Seems to work, I'm using it ;).
Krzysztof
Kali Hernandez
2010-04-06 08:46:10 UTC
Permalink
Post by Krzysztof Strasburger
Post by Kali Hernandez
So basically this means no solution is really good as for glusterfs 3.0?
As for now, there is (probably) not. IMHO it would be useful to add an
option to DHT, to use load balancing approach instead of hash function.
Combined with no-hashed-lookup, this would effectively restore the
functionality of unify, at a cost of stat'ing each filesystem before file
creation. I understand that this approach does not scale, but the additional
cost is acceptable for a small number of subvolumes.
I'm not really sure of what would the best option be. However, IMHO too,
this limitation cracks the whole purpose of the glusterfs. What use do I
have for a distributed filesystem which is (eventually) unable to store
a file when it does actually have free space to allocate it? In an
environment where a lot of small files are to be stored mixed along with
some others (not so many) huge ones, this means most probably you will
run into a situation where the cluster report no free space even if it is.
Post by Krzysztof Strasburger
Post by Kali Hernandez
Does the Unify translator work properly in 2.0.x?
Seems to work, I'm using it ;).
I have just downgraded back to 2.0 and I'm right now trying the Unify.
However to copy all the data back into the cluster (500+ Gb) over the
net is a real pain and will take a lot of time given the read/write
performance (I have all the data on another glusterfs volume, and
reading to one + copying to the new one result in ~ 2.5 Mb/s effective
speed).

The worst point on using Unify, for me, is the need of the namespace
child. As I can't risk on having a SPOF there, I had to take 2 nodes out
for making the namespace node, thus loosing ~ 40 Gb of effective storage
size. Any better config suggestion is more than welcome :-)

-kali-
Krzysztof Strasburger
2010-04-06 09:14:25 UTC
Permalink
Post by Kali Hernandez
Post by Krzysztof Strasburger
Post by Kali Hernandez
Does the Unify translator work properly in 2.0.x?
Seems to work, I'm using it ;).
The worst point on using Unify, for me, is the need of the namespace
child. As I can't risk on having a SPOF there, I had to take 2 nodes out
for making the namespace node, thus loosing ~ 40 Gb of effective storage
size. Any better config suggestion is more than welcome :-)
You don't need to dedicate nodes for the namespace. It contains only
directory entries with zero sizes. I simply made replicated namespace on
3 nodes carrying also the real data, in separate directories. This way,
there is no SPOF with unify and no space is wasted. Make sure that the
underlying filesystems have enough inode entries (this is usually not
a problem, unless you create many small files).
Krzysztof
Kali Hernandez
2010-04-06 10:47:43 UTC
Permalink
Well, let's say I changed my mind about the namespace node(s) after I
had started moving a lot of data. In my case, I have dedicated 2 full
nodes (which is quite useless) as namespace nodes.

After realizing I did wrong, I want delete my namespace data from those
nodes, and add their storage space to the cluster. I would delete the
zero-sized files stored on those nodes, and then add them in the client
config as 2 nodes mirrored, later aggregated in the unify config.

As I will need to have some namespace node config for unify, I add
another brick-ns config on the server side on those 2 nodes, and I set
the client volume to use the new brick-ns's (replicated) as namespace on
my unify. But the info here is empty, so Unify will not be able to find
any file.

Is there any way so I can re-create all the namespace info?

Would it work if I just move (when both client and servers are down) the
info from the current brick to the newly created brick-ns storage folder?

Can I freely add new nodes to the Unify'ed gluster? I am using ALU
scheduler.

Like I said earlier, moving all my data files into the new gluster takes
LONG, as much as 4 days (with such a poor transfer speed)...


Thanks in advance!

-kali-
Post by Krzysztof Strasburger
You don't need to dedicate nodes for the namespace. It contains only
directory entries with zero sizes. I simply made replicated namespace on
3 nodes carrying also the real data, in separate directories. This way,
there is no SPOF with unify and no space is wasted. Make sure that the
underlying filesystems have enough inode entries (this is usually not
a problem, unless you create many small files).
Krzysztof
Krzysztof Strasburger
2010-04-06 11:11:10 UTC
Permalink
Post by Kali Hernandez
Well, let's say I changed my mind about the namespace node(s) after I
had started moving a lot of data. In my case, I have dedicated 2 full
nodes (which is quite useless) as namespace nodes.
After realizing I did wrong, I want delete my namespace data from those
nodes, and add their storage space to the cluster. I would delete the
zero-sized files stored on those nodes, and then add them in the client
config as 2 nodes mirrored, later aggregated in the unify config.
As I will need to have some namespace node config for unify, I add
another brick-ns config on the server side on those 2 nodes, and I set
the client volume to use the new brick-ns's (replicated) as namespace on
my unify. But the info here is empty, so Unify will not be able to find
any file.
Is there any way so I can re-create all the namespace info?
Would it work if I just move (when both client and servers are down) the
info from the current brick to the newly created brick-ns storage folder?
IMHO it will work, but do not change file modification times in the
namespace. Try to umount the unify volume, mount the old and new namespace
(as replicated volumes) separately and cp -a (safer, as you may return back
to the old namespace, if something goes the wrong way) or mv should do the rest.
Then change the namespace volume in config files, mount the unified volume
and see, whether it works.
Post by Kali Hernandez
Can I freely add new nodes to the Unify'ed gluster? I am using ALU
scheduler.
I didn't try that, but nothing prevents you. Do not forget to stop glusterfs
first and update the client config files on all nodes, before you start
to use the new setup. Unify has some nice features, which DHT in its present
form doesn't have.
Krzysztof
Kali Hernandez
2010-04-07 03:40:59 UTC
Permalink
Hi,

First of all, excuse my imperfect English, and use the following info as
a user story for my personal experience, the results and problems shown
here may or may not apply to your particular environment and
requirements, but if you are thinking on using glusterfs or are already
having problems, this might be useful:

Like someone already read on my previous messages, I am facing a
situation where Distribute / Strip translators will run into "non free
space" situations even when the overall cluster shows Gb's of free space
left. So after some good advice I am falling back to Unify translator.

As I have a previous glusterfs setup (as distribute) and I had to move
everything from there (and I don't have an intermediate mountpoint where
I could gather all the data), I am forced to mount both old and new
glusters in the same machine, and then moving data from the old to the
new one.

My first try was creating the new gluster over glusterfs 3.0.3 using
stripe translator. But then I found that I will also fall into the "non
free space" situation, and I had to look for another solution, which
ended in rolling back to 2.0.4 and using Unify translator with ALU
scheduler.

Moving data became extremely (and painfully) slow: reading from a
networked gluster and writing to another one! When using 3.0.3 Stripe I
was hitting some useful transfer speed, but when I switched back to
2.0.4 Unify I got an overall transfer speed of 1,2~2,0 Mb/s. With almost
600 Gb of info that would last forever.

So what I did was stopping the old gluster (distributed) and log in on
the storage nodes, then rsync all the content over ssh into the mount
point of the new gluster. This improved the transfer speed
significantly, achieving some nice speed of almost 20 mbps.

Attached to this email is a graph (generated with Graphite) showing the
evolution of the filling process of the new gluster. The green line
shows the size of the old gluster (original data) while the blue one
shows the evolution of filling new one. There you can see the 1st slope
which was filling the 3.0.3 Stripe, and the 2nd one which belongs to the
2.0.4 Unify.

Mark 1 shows when we hit the false "disk full" situation. Before that
you see the speed of copying from one gluster mount point to another
directly.
Mark 2 shows the incredibly slow speed slope of directly copying from
gluster to gluster when using 2.0 and Unify as target. Note the amazing
difference against 3.0 direct copy. Both copies were performed with a
single "cp -r" in the system mounting both glusters.
Mark 3 shows the speed slope when I started to copy *simultaneously*
using "cp" and "rsync" from the storage nodes. Still it's quite slower
than 3.0 results.
Mark 4 shows the speed with the original gluster stopped and data being
copied using only rsync from the storage nodes over ssh. In this case
you can see much better performance than 3.0


Another important thing to feedback about Unify: I misunderstood the
storage schema and at first I dedicated 2 full storage nodes for
replicating the namespace, thus loosing 40 Gb of overal storage
capacity. Then Krzysztof suggested using the storage space and moving
the namespace volume to another defined brick on the same machine nodes,
thus having 2 machines with both storage and namespace info. Then I run
into the question of having to re-create the whole data on the
namespace, either somehow or having to start the copy back from scratch
(again), but I just tried moving the files locally on the nodes to
another folder (using "mv" and with the glusterfsd daemon stopped) and
it worked finely!

This I recovered back the whole capacity and functionality of the
storage cluster.

For us, the need of this storage cluster is basically a backup space,
most commonly written to and very rarely read from. Also, we did not
want to enter into more complicated clustering schema, and I personally
wanted to avoid using Lustre or GFS (our next alternatives) because both
need to install kernel modules and use LVM for storage, and I find it
more useful for us the possibility to always access locally the data on
the cluster nodes, in case the service goes down. Simplifying the
general structure and keeping all in user space was worthy enough for us.

However, it is quite disappointing finding out that both the actual
given approaches of glusterfs storage do not work properly for
production environment, as stated before, because I will eventually be
unable to use the whole cluster disk space. Unify on 2.0 seems to be
slower in transfer speed, but at least it does work. I can't understand
why the only fully working solution has been deprecated and can't be
used with the last version, making the whole purpose of glusterfs 3.0
just a theoretical solution.


Another very annoying point in the whole process was the complete
absence of online documentation. The official wiki is vague and
incomplete, and the only suggestion I found was "use the volume
generator script", but I hardly can find how the translators work
internally. It is very sad to find out that the only well documented
translator on the whole wiki was the deprecated Unify, aging back to 1.4
versions.

Thanks to all the people on the list who helped me finding out the
problems and solutions, and the answers I couldn't find for the key
questions on the official "documentation".


Hope this become somehow useful for the upcoming people, and as always,
suggestions. comments and corrections are more than welcome.

Continue reading on narkive:
Loading...