Discussion:
[Gluster-users] Files losing permissions
Justin Dossey
2013-08-01 19:25:34 UTC
Permalink
Hi all,

I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.

After adding bricks from nodes 3 and 4 (which changed the cluster type from
simple replicated-2 to distributed-replicated-2), I've discovered that
files are randomly losing their permissions. These are files that aren't
being accessed by our clients-- some of them haven't been touched for years.

When I say "losing their permissions", I mean that regular files are going
from 0644 to 0000 or 1000.

Since this is a real production issue, I run a parallel find process to
correct them every ten minutes. It has corrected approximately 40,000
files in the past 18 hours.

Is anyone else seeing this kind of issue? My searches have turned up
nothing so far.
--
Justin Dossey
CTO, PodOmatic
Joel Young
2013-08-01 19:32:23 UTC
Permalink
I am not seeing exactly that, but I am experiencing the permission for
the root directory of a gluster volume reverting from a particular
user.user to root.root ownership. I have to periodically do a "cd
/share; chown user.user . "
Post by Justin Dossey
Hi all,
I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.
After adding bricks from nodes 3 and 4 (which changed the cluster type from
simple replicated-2 to distributed-replicated-2), I've discovered that files
are randomly losing their permissions. These are files that aren't being
accessed by our clients-- some of them haven't been touched for years.
When I say "losing their permissions", I mean that regular files are going
from 0644 to 0000 or 1000.
Since this is a real production issue, I run a parallel find process to
correct them every ten minutes. It has corrected approximately 40,000 files
in the past 18 hours.
Is anyone else seeing this kind of issue? My searches have turned up
nothing so far.
--
Justin Dossey
CTO, PodOmatic
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Justin Dossey
2013-08-01 19:57:02 UTC
Permalink
Do you know whether it's acceptable to modify permissions on the brick
itself (as opposed to over NFS or via the fuse client)? It seems that as
long as I don't modify the xattrs, the permissions I set on files on the
bricks are passed through.
Post by Joel Young
I am not seeing exactly that, but I am experiencing the permission for
the root directory of a gluster volume reverting from a particular
user.user to root.root ownership. I have to periodically do a "cd
/share; chown user.user . "
Post by Justin Dossey
Hi all,
I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.
After adding bricks from nodes 3 and 4 (which changed the cluster type
from
Post by Justin Dossey
simple replicated-2 to distributed-replicated-2), I've discovered that
files
Post by Justin Dossey
are randomly losing their permissions. These are files that aren't being
accessed by our clients-- some of them haven't been touched for years.
When I say "losing their permissions", I mean that regular files are
going
Post by Justin Dossey
from 0644 to 0000 or 1000.
Since this is a real production issue, I run a parallel find process to
correct them every ten minutes. It has corrected approximately 40,000
files
Post by Justin Dossey
in the past 18 hours.
Is anyone else seeing this kind of issue? My searches have turned up
nothing so far.
--
Justin Dossey
CTO, PodOmatic
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Justin Dossey
CTO, PodOmatic
Justin Dossey
2013-08-01 21:25:28 UTC
Permalink
One thing I do see with the issue we're having is that the files which have
lost their permissions have "bad" versions on multiple bricks. Since the
replica count is 2 for any given file, there should be only two copies of
each, no?

For example, the file below has zero-length, zero-permission versions on
uds06/brick2 and uds-07/brick2, but good versions on uds-05/brick1 and
uds-06/brick1.

FILE is /09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-05 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-06 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-06 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-07 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump

Is it acceptable for me to just delete the zero-length copies?
Post by Justin Dossey
Do you know whether it's acceptable to modify permissions on the brick
itself (as opposed to over NFS or via the fuse client)? It seems that as
long as I don't modify the xattrs, the permissions I set on files on the
bricks are passed through.
Post by Joel Young
I am not seeing exactly that, but I am experiencing the permission for
the root directory of a gluster volume reverting from a particular
user.user to root.root ownership. I have to periodically do a "cd
/share; chown user.user . "
Post by Justin Dossey
Hi all,
I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.
After adding bricks from nodes 3 and 4 (which changed the cluster type
from
Post by Justin Dossey
simple replicated-2 to distributed-replicated-2), I've discovered that
files
Post by Justin Dossey
are randomly losing their permissions. These are files that aren't
being
Post by Justin Dossey
accessed by our clients-- some of them haven't been touched for years.
When I say "losing their permissions", I mean that regular files are
going
Post by Justin Dossey
from 0644 to 0000 or 1000.
Since this is a real production issue, I run a parallel find process to
correct them every ten minutes. It has corrected approximately 40,000
files
Post by Justin Dossey
in the past 18 hours.
Is anyone else seeing this kind of issue? My searches have turned up
nothing so far.
--
Justin Dossey
CTO, PodOmatic
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Justin Dossey
CTO, PodOmatic
--
Justin Dossey
CTO, PodOmatic
Anand Avati
2013-08-02 04:11:10 UTC
Permalink
Justin,
What you are seeing are internal DHT linkfiles. They are zero byte files
with mode 01000. Changing their mode forcefully in the backend to something
else WILL render your files inaccessible from the mount point. I am
assuming that you have seen these files only in the backend and not from
the mount point. And accessing/modifying files like this directly from the
backend is very dangerous for your data, as explained in this very example.

Avati
Post by Justin Dossey
One thing I do see with the issue we're having is that the files which
have lost their permissions have "bad" versions on multiple bricks. Since
the replica count is 2 for any given file, there should be only two copies
of each, no?
For example, the file below has zero-length, zero-permission versions on
uds06/brick2 and uds-07/brick2, but good versions on uds-05/brick1 and
uds-06/brick1.
FILE is /09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-05 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-06 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-06 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-07 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
Is it acceptable for me to just delete the zero-length copies?
Post by Justin Dossey
Do you know whether it's acceptable to modify permissions on the brick
itself (as opposed to over NFS or via the fuse client)? It seems that as
long as I don't modify the xattrs, the permissions I set on files on the
bricks are passed through.
Post by Joel Young
I am not seeing exactly that, but I am experiencing the permission for
the root directory of a gluster volume reverting from a particular
user.user to root.root ownership. I have to periodically do a "cd
/share; chown user.user . "
Post by Justin Dossey
Hi all,
I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.
After adding bricks from nodes 3 and 4 (which changed the cluster type
from
Post by Justin Dossey
simple replicated-2 to distributed-replicated-2), I've discovered that
files
Post by Justin Dossey
are randomly losing their permissions. These are files that aren't
being
Post by Justin Dossey
accessed by our clients-- some of them haven't been touched for years.
When I say "losing their permissions", I mean that regular files are
going
Post by Justin Dossey
from 0644 to 0000 or 1000.
Since this is a real production issue, I run a parallel find process to
correct them every ten minutes. It has corrected approximately 40,000
files
Post by Justin Dossey
in the past 18 hours.
Is anyone else seeing this kind of issue? My searches have turned up
nothing so far.
--
Justin Dossey
CTO, PodOmatic
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Justin Dossey
CTO, PodOmatic
--
Justin Dossey
CTO, PodOmatic
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Maik Kulbe
2013-08-02 11:54:09 UTC
Permalink
Hi,

I've just had a problem removing a directory with test files. I had an inaccessible folder which I could neither delete nor read on the client(both NFS and FUSE client). On the backend, the folder had completely 0'd permissions and the files showed the 0'd permissions with the sticky bit. I can't remove the folder on the client(it fails with 'directory not empty') but if I delete the empty files on the backend, it's gone. Is there any explanation for this?

I also found that this only happens, if I remove the folder recursivly over NFS. When I remove the files in the folder first there are no 0-size files on the backend and I can delete the directory with rmdir without any problem.
Post by Anand Avati
Justin,
What you are seeing are internal DHT linkfiles. They are zero byte files
with mode 01000. Changing their mode forcefully in the backend to
something else WILL render your files inaccessible from the mount point. I
am assuming that you have seen these files only in the backend and not
from the mount point And accessing/modifying files like this directly
from the backend is very dangerous for your data, as explained in this
very example.
Avati
One thing I do see with the issue we're having is that the files which
have lost their permissions have "bad" versions on multiple bricks.
 Since the replica count is 2 for any given file, there should be only
two copies of each, no?  
For example, the file below has zero-length, zero-permission versions on
uds06/brick2 and uds-07/brick2, but good versions on uds-05/brick1 and
uds-06/brick1.
FILE is
/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-05 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-06 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-06 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
uds-07 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
Is it acceptable for me to just delete the zero-length copies?
Do you know whether it's acceptable to modify permissions on the brick
itself (as opposed to over NFS or via the fuse client)?  It seems that
as long as I don't modify the xattrs, the permissions I set on files
on the bricks are passed through.
I am not seeing exactly that, but I am experiencing the permission
for
the root directory of a gluster volume reverting from a particular
user.user to root.root ownership.  I have to periodically do a "cd
/share; chown user.user . "
Post by Justin Dossey
Hi all,
I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.
After adding bricks from nodes 3 and 4 (which changed the cluster
type from
Post by Justin Dossey
simple replicated-2 to distributed-replicated-2), I've discovered
that files
Post by Justin Dossey
are randomly losing their permissions.  These are files that
aren't being
Post by Justin Dossey
accessed by our clients-- some of them haven't been touched for
years.
Post by Justin Dossey
When I say "losing their permissions", I mean that regular files
are going
Post by Justin Dossey
from 0644 to 0000 or 1000.
Since this is a real production issue, I run a parallel find
process to
Post by Justin Dossey
correct them every ten minutes.  It has corrected approximately
40,000 files
Post by Justin Dossey
in the past 18 hours.
Is anyone else seeing this kind of issue?  My searches have turned
up
Post by Justin Dossey
nothing so far.
--
Justin Dossey
CTO, PodOmatic
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Justin Dossey
CTO, PodOmatic
--
Justin Dossey
CTO, PodOmatic
_______________________________________________
Gluster-users mailing list
http://supercolony.gluster.org/mailman/listinfo/gluster-users
Justin Dossey
2013-08-02 18:10:39 UTC
Permalink
It sounds like this is related to NFS.

Anand, thank you for the response. I was under the impression that DHT
linkfiles are only in the .glusterfs subdirectory on the brick; in my case,
these files are outside that directory. Furthermore, they aren't named
like DHT linkfiles (using that hash key)-- they are named the same as the
actual files. Finally, once I removed the bad files and their DHT
linkfiles, the issue went away and the files remained accessible. I had to
remove around 100,000 of these bad 000/1000 zero-length files (and their
DHT linkfiles) last night; only 324 additional files were detected.

On my volumes, I use a similar hashing scheme to the DHT one for regular
files-- the top level at the volume only has directories 00 through ff,
etc, etc. Perhaps this caused some confusion for you?

For transparency, here is the command I run from the client to detect and
correct bad file permissions:

find ./?? -type f -perm 000 -ls -exec chmod -v 644 {} \; -o -type f -perm
1000 -ls -exec chmod -v 644 {} \;

If this number does not grow, I will conclude that I just missed these 324
files. If the number gets larger, I can only conclude that GlusterFS is
somehow introducing this corruption. If that is the case, I'll dig some
more.

Maik, I may have experienced the same thing. I used rsync over NFS without
--inplace to load my data into the GlusterFS volume, and I wound up with
all those bad files on the wrong bricks (i.e. a file should be only on
server1-brick1 and server2-brick1, but "bad" versions (1000, zero-length)
were also on server3-brick1 and server4-brick1, leading to confusing
results on the clients). Since then, I've switched to using the native
client for data loads and also the --inplace flag to rsync.

Other factors which may have caused the issue I had:

1. During a large rebalance, one GlusterFS node exceeded its system max
open files limit, and was rebooted. The rebalance did not stop while this
took place.
2. Three times during the same rebalance, the Gluster NFS daemon used an
excessive amount of memory and was killed by the kernel oom-killer. The
system in question has 8 GB of memory, was the rebalance master, and is not
running any significant software besides GlusterFS. Each time, I
restarted glusterfs and the NFS server daemon started serving files again.
The rebalance was not interrupted.
Post by Maik Kulbe
Hi,
I've just had a problem removing a directory with test files. I had an
inaccessible folder which I could neither delete nor read on the
client(both NFS and FUSE client). On the backend, the folder had completely
0'd permissions and the files showed the 0'd permissions with the sticky
bit. I can't remove the folder on the client(it fails with 'directory not
empty') but if I delete the empty files on the backend, it's gone. Is there
any explanation for this?
I also found that this only happens, if I remove the folder recursivly
over NFS. When I remove the files in the folder first there are no 0-size
files on the backend and I can delete the directory with rmdir without any
problem.
Justin,
Post by Anand Avati
What you are seeing are internal DHT linkfiles. They are zero byte files
with mode 01000. Changing their mode forcefully in the backend to
something else WILL render your files inaccessible from the mount point. I
am assuming that you have seen these files only in the backend and not
from the mount point And accessing/modifying files like this directly
from the backend is very dangerous for your data, as explained in this
very example.
Avati
One thing I do see with the issue we're having is that the files which
have lost their permissions have "bad" versions on multiple bricks.
Since the replica count is 2 for any given file, there should be only
two copies of each, no?
For example, the file below has zero-length, zero-permission versions on
uds06/brick2 and uds-07/brick2, but good versions on uds-05/brick1 and
uds-06/brick1.
FILE is
/09/38/1f/eastar/mail/entries/**trash/2008-07-06T13_41_56-07_**00.dump
uds-05 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
uds-06 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
uds-06 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
uds-07 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
Is it acceptable for me to just delete the zero-length copies?
Do you know whether it's acceptable to modify permissions on the brick
itself (as opposed to over NFS or via the fuse client)? It seems that
as long as I don't modify the xattrs, the permissions I set on files
on the bricks are passed through.
I am not seeing exactly that, but I am experiencing the permission
for
the root directory of a gluster volume reverting from a particular
user.user to root.root ownership. I have to periodically do a "cd
/share; chown user.user . "
Post by Justin Dossey
Hi all,
I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.
After adding bricks from nodes 3 and 4 (which changed the cluster
type from
Post by Justin Dossey
simple replicated-2 to distributed-replicated-2), I've discovered
that files
Post by Justin Dossey
are randomly losing their permissions. These are files that
aren't being
Post by Justin Dossey
accessed by our clients-- some of them haven't been touched for
years.
Post by Justin Dossey
When I say "losing their permissions", I mean that regular files
are going
Post by Justin Dossey
from 0644 to 0000 or 1000.
Since this is a real production issue, I run a parallel find
process to
Post by Justin Dossey
correct them every ten minutes. It has corrected approximately
40,000 files
Post by Justin Dossey
in the past 18 hours.
Is anyone else seeing this kind of issue? My searches have turned
up
Post by Justin Dossey
nothing so far.
--
Justin Dossey
CTO, PodOmatic
______________________________**_________________
Gluster-users mailing list
http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
--
Justin Dossey
CTO, PodOmatic
--
Justin Dossey
CTO, PodOmatic
______________________________**_________________
Gluster-users mailing list
http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
--
Justin Dossey
CTO, PodOmatic
Justin Dossey
2013-08-06 16:14:58 UTC
Permalink
Update on this issue:

After I deleted the zero-length files which were located on the wrong
bricks, the issue of files losing permissions is mostly resolved. I left
my find running every 10 minutes for the last five days, though, and the
problem continues to recur with a few hundred files every day or two. This
leads me to believe there is some bug in GlusterFS which causes this to
happen.

My script recorded 1232 files which lost their permissions on August 2 and
870 files on August 6. As I noted earlier, these files were created years
ago. One notable fact is that the mtime as reported by GlusterFS is July
28th,

The rebalance log sheds a bit of light on this, but I'm not sure what to
conclude. Here is the log for one of the affected files:

UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.010359] I
[dht-rebalance.c:1063:gf_defrag_migrate_data] 0-UDS8-dht: migrate data
called on /6f/83/ca/rrrivera25/media
UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.068909] I
[dht-rebalance.c:647:dht_migrate_file] 0-UDS8-dht:
/6f/83/ca/rrrivera25/media/3301856.jpg: attempting to move from
UDS8-replicate-0 to UDS8-replicate-2
UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.068949] I
[dht-rebalance.c:647:dht_migrate_file] 0-UDS8-dht:
/6f/83/ca/rrrivera25/media/3301856.jpg: attempting to move from
UDS8-replicate-0 to UDS8-replicate-2
UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.122885] W
[client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-UDS8-client-0: remote
operation failed: No such file or directory. Path:
/6f/83/ca/rrrivera25/media/3301856.jpg
(00000000-0000-0000-0000-000000000000). Key: (null)
UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.123274] W
[client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-UDS8-client-1: remote
operation failed: No such file or directory. Path:
/6f/83/ca/rrrivera25/media/3301856.jpg
(00000000-0000-0000-0000-000000000000). Key: (null)
UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.123330] W
[dht-rebalance.c:739:dht_migrate_file] 0-UDS8-dht:
/6f/83/ca/rrrivera25/media/3301856.jpg: failed to get xattr from
UDS8-replicate-0 (No such file or directory)
UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.123380] W
[dht-rebalance.c:745:dht_migrate_file] 0-UDS8-dht:
/6f/83/ca/rrrivera25/media/3301856.jpg: failed to set xattr on
UDS8-replicate-2 (Invalid argument)
UDS8-rebalance.log.3.gz:[2013-07-28 13:52:08.134502] I
[dht-rebalance.c:856:dht_migrate_file] 0-UDS8-dht: completed migration of
/6f/83/ca/rrrivera25/media/3301856.jpg from subvolume UDS8-replicate-0 to
UDS8-replicate-2
Post by Justin Dossey
It sounds like this is related to NFS.
Anand, thank you for the response. I was under the impression that DHT
linkfiles are only in the .glusterfs subdirectory on the brick; in my case,
these files are outside that directory. Furthermore, they aren't named
like DHT linkfiles (using that hash key)-- they are named the same as the
actual files. Finally, once I removed the bad files and their DHT
linkfiles, the issue went away and the files remained accessible. I had to
remove around 100,000 of these bad 000/1000 zero-length files (and their
DHT linkfiles) last night; only 324 additional files were detected.
On my volumes, I use a similar hashing scheme to the DHT one for regular
files-- the top level at the volume only has directories 00 through ff,
etc, etc. Perhaps this caused some confusion for you?
For transparency, here is the command I run from the client to detect and
find ./?? -type f -perm 000 -ls -exec chmod -v 644 {} \; -o -type f -perm
1000 -ls -exec chmod -v 644 {} \;
If this number does not grow, I will conclude that I just missed these 324
files. If the number gets larger, I can only conclude that GlusterFS is
somehow introducing this corruption. If that is the case, I'll dig some
more.
Maik, I may have experienced the same thing. I used rsync over NFS
without --inplace to load my data into the GlusterFS volume, and I wound up
with all those bad files on the wrong bricks (i.e. a file should be only on
server1-brick1 and server2-brick1, but "bad" versions (1000, zero-length)
were also on server3-brick1 and server4-brick1, leading to confusing
results on the clients). Since then, I've switched to using the native
client for data loads and also the --inplace flag to rsync.
1. During a large rebalance, one GlusterFS node exceeded its system max
open files limit, and was rebooted. The rebalance did not stop while this
took place.
2. Three times during the same rebalance, the Gluster NFS daemon used an
excessive amount of memory and was killed by the kernel oom-killer. The
system in question has 8 GB of memory, was the rebalance master, and is not
running any significant software besides GlusterFS. Each time, I
restarted glusterfs and the NFS server daemon started serving files again.
The rebalance was not interrupted.
Post by Maik Kulbe
Hi,
I've just had a problem removing a directory with test files. I had an
inaccessible folder which I could neither delete nor read on the
client(both NFS and FUSE client). On the backend, the folder had completely
0'd permissions and the files showed the 0'd permissions with the sticky
bit. I can't remove the folder on the client(it fails with 'directory not
empty') but if I delete the empty files on the backend, it's gone. Is there
any explanation for this?
I also found that this only happens, if I remove the folder recursivly
over NFS. When I remove the files in the folder first there are no 0-size
files on the backend and I can delete the directory with rmdir without any
problem.
Justin,
Post by Anand Avati
What you are seeing are internal DHT linkfiles. They are zero byte files
with mode 01000. Changing their mode forcefully in the backend to
something else WILL render your files inaccessible from the mount point. I
am assuming that you have seen these files only in the backend and not
from the mount point And accessing/modifying files like this directly
from the backend is very dangerous for your data, as explained in this
very example.
Avati
One thing I do see with the issue we're having is that the files which
have lost their permissions have "bad" versions on multiple bricks.
Since the replica count is 2 for any given file, there should be only
two copies of each, no?
For example, the file below has zero-length, zero-permission versions on
uds06/brick2 and uds-07/brick2, but good versions on uds-05/brick1 and
uds-06/brick1.
FILE is
/09/38/1f/eastar/mail/entries/**trash/2008-07-06T13_41_56-07_**00.dump
uds-05 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
uds-06 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
/export/brick1/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
uds-06 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
uds-07 ---------T 2 apache apache 0 Jul 23 03:11
/export/brick2/vol1/09/38/1f/**eastar/mail/entries/trash/**
2008-07-06T13_41_56-07_00.dump
Is it acceptable for me to just delete the zero-length copies?
Do you know whether it's acceptable to modify permissions on the brick
itself (as opposed to over NFS or via the fuse client)? It seems that
as long as I don't modify the xattrs, the permissions I set on files
on the bricks are passed through.
I am not seeing exactly that, but I am experiencing the permission
for
the root directory of a gluster volume reverting from a particular
user.user to root.root ownership. I have to periodically do a "cd
/share; chown user.user . "
Post by Justin Dossey
Hi all,
I have a relatively-new GlusterFS 3.3.2 4-node cluster in
distributed-replicated mode running in a production environment.
After adding bricks from nodes 3 and 4 (which changed the cluster
type from
Post by Justin Dossey
simple replicated-2 to distributed-replicated-2), I've discovered
that files
Post by Justin Dossey
are randomly losing their permissions. These are files that
aren't being
Post by Justin Dossey
accessed by our clients-- some of them haven't been touched for
years.
Post by Justin Dossey
When I say "losing their permissions", I mean that regular files
are going
Post by Justin Dossey
from 0644 to 0000 or 1000.
Since this is a real production issue, I run a parallel find
process to
Post by Justin Dossey
correct them every ten minutes. It has corrected approximately
40,000 files
Post by Justin Dossey
in the past 18 hours.
Is anyone else seeing this kind of issue? My searches have turned
up
Post by Justin Dossey
nothing so far.
--
Justin Dossey
CTO, PodOmatic
______________________________**_________________
Gluster-users mailing list
http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
--
Justin Dossey
CTO, PodOmatic
--
Justin Dossey
CTO, PodOmatic
______________________________**_________________
Gluster-users mailing list
http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
--
Justin Dossey
CTO, PodOmatic
--
Justin Dossey
CTO, PodOmatic
Continue reading on narkive:
Loading...