Discussion:
[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only
David Gossage
2016-08-11 11:52:14 UTC
Permalink
Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?

*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284

---------- Forwarded message ----------
From: David Gossage <***@carouselchecks.com>
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
To: users <***@ovirt.org>


Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.

gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log

dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard or
images/

7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive of
the vm

[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation
3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation
3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation
3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]

I attached vdsm.log starting from when I spun up vm on offending node
Dan Lavu
2016-08-12 21:25:06 UTC
Permalink
David,

I'm seeing similar behavior in my lab, but it has been caused by healing
files in the gluster cluster, though I attribute my problems to problems
with the storage fabric. See if 'gluster volume heal $VOL info' indicates
files that are being healed, and if those reduce in number, can the VM
start?

Dan
Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard or
images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive of
the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David Gossage
2016-08-12 21:40:13 UTC
Permalink
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by healing
files in the gluster cluster, though I attribute my problems to problems
with the storage fabric. See if 'gluster volume heal $VOL info' indicates
files that are being healed, and if those reduce in number, can the VM
start?
I haven't had any files in a state of being healed according to either of
the 3 storage nodes.

I shut down one VM that has been around awhile a moment ago then told it to
start on the one ovirt server that complained previously. It ran fine, and
I was able to migrate it off and on the host no issues.

I told one of the new VM's to migrate to the one node and within seconds it
paused from unknown storage errors no shards showing heals nothing with an
error on storage node. Same stale file handle issues.

I'll probably put this node in maintenance later and reboot it. Other than
that I may re-clone those 2 reccent VM's. maybe images just got corrupted
though why it would only fail on one node of 3 if image was bad not sure.


Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard
or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive
of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Krutika Dhananjay
2016-08-13 05:26:20 UTC
Permalink
1. Could you share the output of `gluster volume heal <VOL> info`?
2. `gluster volume info`
3. fuse mount logs of the affected volume(s)?
4. glustershd logs
5. Brick logs

-Krutika
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by healing
files in the gluster cluster, though I attribute my problems to problems
with the storage fabric. See if 'gluster volume heal $VOL info' indicates
files that are being healed, and if those reduce in number, can the VM
start?
I haven't had any files in a state of being healed according to either of
the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then told it
to start on the one ovirt server that complained previously. It ran fine,
and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within seconds
it paused from unknown storage errors no shards showing heals nothing with
an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it. Other
than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard
or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive
of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David Gossage
2016-08-13 11:15:51 UTC
Permalink
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0

Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0

Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
Options Reconfigured:
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht:
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1
gfid=31d7c904-775e-4b9f-8ef7-888218679845 fd=0x7f00a80bde58 (Stale file
handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)

4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011]
[glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume file from
server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed occurring. I've
zip'd and attached all 3 nodes logs, but from this snippet on one node none
of them seem to coincide with the time window when migration had issues.
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an image for a
different vm than one I had issues with as well. Maybe gluster is trying
to do some sort of make shard test before writing out changes that would go
to that image and that shard file?

[2016-08-12 18:48:22.463628] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed
[File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 failed
[File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed
[File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] 0-GLUSTER1-posix:
mknod on /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
failed [File exists]" repeated 5 times between [2016-08-12 18:48:22.463628]
and [2016-08-12 18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 failed
[File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] 0-GLUSTER1-posix:
mknod on /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
failed [File exists]" repeated 5 times between [2016-08-12 18:49:16.065502]
and [2016-08-12 18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301 failed
[File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580 failed
[File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580 failed
[File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211 failed
[File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] 0-GLUSTER1-posix:
mknod on /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
failed [File exists]" repeated 16 times between [2016-08-13
01:47:23.338036] and [2016-08-13 01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211 failed
[File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177 failed
[File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178 failed
[File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by healing
files in the gluster cluster, though I attribute my problems to problems
with the storage fabric. See if 'gluster volume heal $VOL info' indicates
files that are being healed, and if those reduce in number, can the VM
start?
I haven't had any files in a state of being healed according to either of
the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then told it
to start on the one ovirt server that complained previously. It ran fine,
and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within seconds
it paused from unknown storage errors no shards showing heals nothing with
an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it. Other
than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard
or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive
of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David Gossage
2016-08-13 11:37:28 UTC
Permalink
Here is reply again just in case. I got quarantine message so not sure if
first went through or wll anytime soon. Brick logs weren't large so Ill
just include as text files this time

The attached file bricks.zip you sent to <***@redhat.com>;<Gluster
-***@gluster.org> on 8/13/2016 7:17:35 AM was quarantined. As a safety
precaution, the University of South Carolina quarantines .zip and .docm
files sent via email. If this is a legitimate attachment <
***@redhat.com>;<Gluster-***@gluster.org> may contact the Service
Desk at 803-777-1800 (***@sc.edu) and the attachment file will be
released from quarantine and delivered.
Post by David Gossage
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88.
(Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040] [dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845
fd=0x7f00a80bde58 (Stale file handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-
02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-
02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-
86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-
86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-
02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-
02b1-435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-
86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-
86dd-4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011] [glusterfsd.c:1323:reincarnate]
0-glusterfsd: Fetching the volume file from server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed occurring.
I've zip'd and attached all 3 nodes logs, but from this snippet on one node
none of them seem to coincide with the time window when migration had
issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an image
for a different vm than one I had issues with as well. Maybe gluster is
trying to do some sort of make shard test before writing out changes that
would go to that image and that shard file?
[2016-08-12 18:48:22.463628] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 failed [File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated 5
times between [2016-08-12 18:48:22.463628] and [2016-08-12 18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated 5
times between [2016-08-12 18:49:16.065502] and [2016-08-12 18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
83794e5d-2225-4560-8df6-7c903c8a648a.1301 failed [File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
0e5ad95d-722d-4374-88fb-66fca0b14341.580 failed [File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
0e5ad95d-722d-4374-88fb-66fca0b14341.580 failed [File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
18843fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
18843fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated
16 times between [2016-08-13 01:47:23.338036] and [2016-08-13
01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211 failed [File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
18843fb4-e31c-4fc3-b519-cc6e5e947813.177 failed [File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
18843fb4-e31c-4fc3-b519-cc6e5e947813.178 failed [File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by
healing files in the gluster cluster, though I attribute my problems to
problems with the storage fabric. See if 'gluster volume heal $VOL info'
indicates files that are being healed, and if those reduce in number, can
the VM start?
I haven't had any files in a state of being healed according to either
of the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then told it
to start on the one ovirt server that complained previously. It ran fine,
and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within seconds
it paused from unknown storage errors no shards showing heals nothing with
an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it. Other
than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining
of stale file handles on a few new VM's I migrated over. No errors on
storage nodes just client. Maybe just put that one in maintenance and
restart gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs
.shard or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable
drive of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David Gossage
2016-08-15 19:32:21 UTC
Permalink
Post by David Gossage
Here is reply again just in case. I got quarantine message so not sure if
first went through or wll anytime soon. Brick logs weren't large so Ill
just include as text files this time
Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after
restrating the complaining ovirt node I was able to migrate the 2 vm with
issues. So not sure why the mount got stale, but I imagine that one node
couldn't see the new image files after that had occurred?

Still getting a few sporadic errors, but seems much fewer than before and
never get any corresponding notices in any other log files

[2016-08-15 13:40:31.510798] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed
[File exists]
[2016-08-15 13:40:31.522067] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed
[File exists]
[2016-08-15 17:47:06.375708] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722 failed
[File exists]
[2016-08-15 17:47:26.435198] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723 failed
[File exists]
[2016-08-15 17:47:06.405481] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722 failed
[File exists]
[2016-08-15 17:47:26.464542] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723 failed
[File exists]
[2016-08-15 18:46:47.187967] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.739 failed
[File exists]
[2016-08-15 18:47:41.414312] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed
[File exists]
[2016-08-15 18:47:41.450470] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed
[File exists]
Post by David Gossage
precaution, the University of South Carolina quarantines .zip and .docm
files sent via email. If this is a legitimate attachment <
released from quarantine and delivered.
On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
Post by David Gossage
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845
fd=0x7f00a80bde58 (Stale file handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-
435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-
435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-
4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-
4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-
435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-
435c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-
4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-
4aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011]
[glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume file
from server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed occurring.
I've zip'd and attached all 3 nodes logs, but from this snippet on one node
none of them seem to coincide with the time window when migration had
issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an image
for a different vm than one I had issues with as well. Maybe gluster is
trying to do some sort of make shard test before writing out changes that
would go to that image and that shard file?
[2016-08-12 18:48:22.463628] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.698 failed [File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated 5
times between [2016-08-12 18:48:22.463628] and [2016-08-12 18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.698 failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated 5
times between [2016-08-12 18:49:16.065502] and [2016-08-12 18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/8379
4e5d-2225-4560-8df6-7c903c8a648a.1301 failed [File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a
d95d-722d-4374-88fb-66fca0b14341.580 failed [File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a
d95d-722d-4374-88fb-66fca0b14341.580 failed [File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated 16
times between [2016-08-13 01:47:23.338036] and [2016-08-13 01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ffbb
cce0-3c4a-4fdf-b79f-a96ca3215657.211 failed [File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.177 failed [File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.178 failed [File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by
healing files in the gluster cluster, though I attribute my problems to
problems with the storage fabric. See if 'gluster volume heal $VOL info'
indicates files that are being healed, and if those reduce in number, can
the VM start?
I haven't had any files in a state of being healed according to either
of the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then told
it to start on the one ovirt server that complained previously. It ran
fine, and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within
seconds it paused from unknown storage errors no shards showing heals
nothing with an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it. Other
than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining
of stale file handles on a few new VM's I migrated over. No errors on
storage nodes just client. Maybe just put that one in maintenance and
restart gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs
.shard or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable
drive of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Krutika Dhananjay
2016-08-15 23:24:48 UTC
Permalink
No. The EEXIST errors are normal and can be ignored. This can happen when
multiple threads try to create the same
shard in parallel. Nothing wrong with that.

-Krutika
On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
Post by David Gossage
Here is reply again just in case. I got quarantine message so not sure
if first went through or wll anytime soon. Brick logs weren't large so Ill
just include as text files this time
Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after
restrating the complaining ovirt node I was able to migrate the 2 vm with
issues. So not sure why the mount got stale, but I imagine that one node
couldn't see the new image files after that had occurred?
Still getting a few sporadic errors, but seems much fewer than before and
never get any corresponding notices in any other log files
[2016-08-15 13:40:31.510798] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
[2016-08-15 13:40:31.522067] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
[2016-08-15 17:47:06.375708] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
d5a328be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
[2016-08-15 17:47:26.435198] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
d5a328be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
[2016-08-15 17:47:06.405481] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
d5a328be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
[2016-08-15 17:47:26.464542] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
d5a328be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
[2016-08-15 18:46:47.187967] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.739 failed [File exists]
[2016-08-15 18:47:41.414312] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
[2016-08-15 18:47:41.450470] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
Post by David Gossage
precaution, the University of South Carolina quarantines .zip and .docm
files sent via email. If this is a legitimate attachment <
be released from quarantine and delivered.
On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
Post by David Gossage
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845
fd=0x7f00a80bde58 (Stale file handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011]
[glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume file
from server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed occurring.
I've zip'd and attached all 3 nodes logs, but from this snippet on one node
none of them seem to coincide with the time window when migration had
issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an image
for a different vm than one I had issues with as well. Maybe gluster is
trying to do some sort of make shard test before writing out changes that
would go to that image and that shard file?
[2016-08-12 18:48:22.463628] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
failed [File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated 5
times between [2016-08-12 18:48:22.463628] and [2016-08-12 18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated 5
times between [2016-08-12 18:49:16.065502] and [2016-08-12 18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301
failed [File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated 16
times between [2016-08-13 01:47:23.338036] and [2016-08-13 01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211
failed [File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177
failed [File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178
failed [File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by
healing files in the gluster cluster, though I attribute my problems to
problems with the storage fabric. See if 'gluster volume heal $VOL info'
indicates files that are being healed, and if those reduce in number, can
the VM start?
I haven't had any files in a state of being healed according to either
of the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then told
it to start on the one ovirt server that complained previously. It ran
fine, and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within
seconds it paused from unknown storage errors no shards showing heals
nothing with an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it. Other
than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining
of stale file handles on a few new VM's I migrated over. No errors on
storage nodes just client. Maybe just put that one in maintenance and
restart gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs
.shard or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable
drive of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David Gossage
2016-08-16 00:20:25 UTC
Permalink
Post by Krutika Dhananjay
No. The EEXIST errors are normal and can be ignored. This can happen when
multiple threads try to create the same
shard in parallel. Nothing wrong with that.
Other than they pop up as E errors making a user worry hehe

Is their a known bug filed against that or should I maybe create one to see
if we can get that sent to an informational level maybe?
Post by Krutika Dhananjay
-Krutika
On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <
On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
Post by David Gossage
Here is reply again just in case. I got quarantine message so not sure
if first went through or wll anytime soon. Brick logs weren't large so Ill
just include as text files this time
Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after
restrating the complaining ovirt node I was able to migrate the 2 vm with
issues. So not sure why the mount got stale, but I imagine that one node
couldn't see the new image files after that had occurred?
Still getting a few sporadic errors, but seems much fewer than before and
never get any corresponding notices in any other log files
[2016-08-15 13:40:31.510798] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a
d95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
[2016-08-15 13:40:31.522067] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a
d95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
[2016-08-15 17:47:06.375708] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
28be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
[2016-08-15 17:47:26.435198] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
28be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
[2016-08-15 17:47:06.405481] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
28be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
[2016-08-15 17:47:26.464542] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
28be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
[2016-08-15 18:46:47.187967] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.739 failed [File exists]
[2016-08-15 18:47:41.414312] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
[2016-08-15 18:47:41.450470] E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
Post by David Gossage
safety precaution, the University of South Carolina quarantines .zip and
.docm files sent via email. If this is a legitimate attachment <
file will be released from quarantine and delivered.
On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay <
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845
fd=0x7f00a80bde58 (Stale file handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011]
[glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume file
from server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed occurring.
I've zip'd and attached all 3 nodes logs, but from this snippet on one node
none of them seem to coincide with the time window when migration had
issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an image
for a different vm than one I had issues with as well. Maybe gluster is
trying to do some sort of make shard test before writing out changes that
would go to that image and that shard file?
[2016-08-12 18:48:22.463628] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
failed [File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated 5
times between [2016-08-12 18:48:22.463628] and [2016-08-12 18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated 5
times between [2016-08-12 18:49:16.065502] and [2016-08-12 18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301
failed [File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated 16
times between [2016-08-13 01:47:23.338036] and [2016-08-13 01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211
failed [File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177
failed [File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178
failed [File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by
healing files in the gluster cluster, though I attribute my problems to
problems with the storage fabric. See if 'gluster volume heal $VOL info'
indicates files that are being healed, and if those reduce in number, can
the VM start?
I haven't had any files in a state of being healed according to
either of the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then told
it to start on the one ovirt server that complained previously. It ran
fine, and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within
seconds it paused from unknown storage errors no shards showing heals
nothing with an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it.
Other than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3
complaining of stale file handles on a few new VM's I migrated over. No
errors on storage nodes just client. Maybe just put that one in
maintenance and restart gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a
3 replicate gluster 3.7.14 starting a VM i just copied in on one node of
the 3 gets the following errors. The other 2 the vm starts fine. All
ovirt and gluster are centos 7 based. VM on start of the one node it tries
to default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs
.shard or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable
drive of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Krutika Dhananjay
2016-08-16 05:21:55 UTC
Permalink
Thanks, I just sent http://review.gluster.org/#/c/15161/1 to reduce the
log-level to DEBUG. Let's see what the maintainers have to say. :)

-Krutika
Post by David Gossage
Post by Krutika Dhananjay
No. The EEXIST errors are normal and can be ignored. This can happen when
multiple threads try to create the same
shard in parallel. Nothing wrong with that.
Other than they pop up as E errors making a user worry hehe
Is their a known bug filed against that or should I maybe create one to
see if we can get that sent to an informational level maybe?
Post by Krutika Dhananjay
-Krutika
On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <
On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
Post by David Gossage
Here is reply again just in case. I got quarantine message so not sure
if first went through or wll anytime soon. Brick logs weren't large so Ill
just include as text files this time
Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after
restrating the complaining ovirt node I was able to migrate the 2 vm with
issues. So not sure why the mount got stale, but I imagine that one node
couldn't see the new image files after that had occurred?
Still getting a few sporadic errors, but seems much fewer than before
and never get any corresponding notices in any other log files
[2016-08-15 13:40:31.510798] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
failed [File exists]
[2016-08-15 13:40:31.522067] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
failed [File exists]
[2016-08-15 17:47:06.375708] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
failed [File exists]
[2016-08-15 17:47:26.435198] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723
failed [File exists]
[2016-08-15 17:47:06.405481] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
failed [File exists]
[2016-08-15 17:47:26.464542] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723
failed [File exists]
[2016-08-15 18:46:47.187967] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.739
failed [File exists]
[2016-08-15 18:47:41.414312] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779
failed [File exists]
[2016-08-15 18:47:41.450470] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779
failed [File exists]
Post by David Gossage
safety precaution, the University of South Carolina quarantines .zip and
.docm files sent via email. If this is a legitimate attachment <
file will be released from quarantine and delivered.
On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay <
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845
fd=0x7f00a80bde58 (Stale file handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011]
[glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume
file from server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed occurring.
I've zip'd and attached all 3 nodes logs, but from this snippet on one node
none of them seem to coincide with the time window when migration had
issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an
image for a different vm than one I had issues with as well. Maybe gluster
is trying to do some sort of make shard test before writing out changes
that would go to that image and that shard file?
[2016-08-12 18:48:22.463628] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
failed [File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated 5
times between [2016-08-12 18:48:22.463628] and [2016-08-12 18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated 5
times between [2016-08-12 18:49:16.065502] and [2016-08-12 18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301
failed [File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated
16 times between [2016-08-13 01:47:23.338036] and [2016-08-13
01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211
failed [File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177
failed [File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178
failed [File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by
healing files in the gluster cluster, though I attribute my problems to
problems with the storage fabric. See if 'gluster volume heal $VOL info'
indicates files that are being healed, and if those reduce in number, can
the VM start?
I haven't had any files in a state of being healed according to
either of the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then
told it to start on the one ovirt server that complained previously. It
ran fine, and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within
seconds it paused from unknown storage errors no shards showing heals
nothing with an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it.
Other than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3
complaining of stale file handles on a few new VM's I migrated over. No
errors on storage nodes just client. Maybe just put that one in
maintenance and restart gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a
3 replicate gluster 3.7.14 starting a VM i just copied in on one node of
the 3 gets the following errors. The other 2 the vm starts fine. All
ovirt and gluster are centos 7 based. VM on start of the one node it tries
to default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs
.shard or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable
drive of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Dan Lavu
2018-06-09 04:08:54 UTC
Permalink
Krutika,

Is it also normal for the following messages as well?

[2018-06-07 06:36:22.008492] E [MSGID: 113020] [posix.c:1395:posix_mknod]
0-rhev_vms-posix: setting gfid on
/gluster/brick/rhev_vms/.shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16158
failed
[2018-06-07 06:36:22.319735] E [MSGID: 113020] [posix.c:1395:posix_mknod]
0-rhev_vms-posix: setting gfid on
/gluster/brick/rhev_vms/.shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16160
failed
[2018-06-07 06:36:24.711800] E [MSGID: 113002] [posix.c:267:posix_lookup]
0-rhev_vms-posix: buf->ia_gfid is null for
/gluster/brick/rhev_vms/.shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177
[No data available]
[2018-06-07 06:36:24.711839] E [MSGID: 115050]
[server-rpc-fops.c:170:server_lookup_cbk] 0-rhev_vms-server: 32334131:
LOOKUP /.shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177
(be318638-e8a0-4c6d-977d-7a937aa84806/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177)
==> (No data available) [No data available]

if so what does it mean?

Dan
Post by Krutika Dhananjay
Thanks, I just sent http://review.gluster.org/#/c/15161/1 to reduce the
log-level to DEBUG. Let's see what the maintainers have to say. :)
-Krutika
On Tue, Aug 16, 2016 at 5:50 AM, David Gossage <
Post by David Gossage
Post by Krutika Dhananjay
No. The EEXIST errors are normal and can be ignored. This can happen
when multiple threads try to create the same
shard in parallel. Nothing wrong with that.
Other than they pop up as E errors making a user worry hehe
Is their a known bug filed against that or should I maybe create one to
see if we can get that sent to an informational level maybe?
Post by Krutika Dhananjay
-Krutika
On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <
On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
Post by David Gossage
Here is reply again just in case. I got quarantine message so not
sure if first went through or wll anytime soon. Brick logs weren't large
so Ill just include as text files this time
Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after
restrating the complaining ovirt node I was able to migrate the 2 vm with
issues. So not sure why the mount got stale, but I imagine that one node
couldn't see the new image files after that had occurred?
Still getting a few sporadic errors, but seems much fewer than before
and never get any corresponding notices in any other log files
[2016-08-15 13:40:31.510798] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
failed [File exists]
[2016-08-15 13:40:31.522067] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
failed [File exists]
[2016-08-15 17:47:06.375708] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
failed [File exists]
[2016-08-15 17:47:26.435198] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723
failed [File exists]
[2016-08-15 17:47:06.405481] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
failed [File exists]
[2016-08-15 17:47:26.464542] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723
failed [File exists]
[2016-08-15 18:46:47.187967] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.739
failed [File exists]
[2016-08-15 18:47:41.414312] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779
failed [File exists]
[2016-08-15 18:47:41.450470] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779
failed [File exists]
Post by David Gossage
safety precaution, the University of South Carolina quarantines .zip and
.docm files sent via email. If this is a legitimate attachment <
file will be released from quarantine and delivered.
On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay <
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845
fd=0x7f00a80bde58 (Stale file handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011]
[glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume
file from server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed
occurring. I've zip'd and attached all 3 nodes logs, but from this snippet
on one node none of them seem to coincide with the time window when
migration had issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard
refers to an image for a different vm than one I had issues with as well.
Maybe gluster is trying to do some sort of make shard test before writing
out changes that would go to that image and that shard file?
[2016-08-12 18:48:22.463628] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
failed [File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated
5 times between [2016-08-12 18:48:22.463628] and [2016-08-12
18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated
5 times between [2016-08-12 18:49:16.065502] and [2016-08-12
18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301
failed [File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated
16 times between [2016-08-13 01:47:23.338036] and [2016-08-13
01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211
failed [File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177
failed [File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178
failed [File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by
healing files in the gluster cluster, though I attribute my problems to
problems with the storage fabric. See if 'gluster volume heal $VOL info'
indicates files that are being healed, and if those reduce in number, can
the VM start?
I haven't had any files in a state of being healed according to
either of the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then
told it to start on the one ovirt server that complained previously. It
ran fine, and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within
seconds it paused from unknown storage errors no shards showing heals
nothing with an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it.
Other than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3
complaining of stale file handles on a few new VM's I migrated over. No
errors on storage nodes just client. Maybe just put that one in
maintenance and restart gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with
a 3 replicate gluster 3.7.14 starting a VM i just copied in on one node of
the 3 gets the following errors. The other 2 the vm starts fine. All
ovirt and gluster are centos 7 based. VM on start of the one node it tries
to default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs
.shard or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable
drive of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Krutika Dhananjay
2018-06-12 08:23:00 UTC
Permalink
Post by Dan Lavu
Krutika,
Is it also normal for the following messages as well?
Yes, this should be fine. It only represents a transient state when
multiple threads/clients are trying to create the same shard at the same
time. These can be ignored.

-Krutika
Post by Dan Lavu
[2018-06-07 06:36:22.008492] E [MSGID: 113020] [posix.c:1395:posix_mknod]
0-rhev_vms-posix: setting gfid on /gluster/brick/rhev_vms/.
shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16158 failed
[2018-06-07 06:36:22.319735] E [MSGID: 113020] [posix.c:1395:posix_mknod]
0-rhev_vms-posix: setting gfid on /gluster/brick/rhev_vms/.
shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16160 failed
[2018-06-07 06:36:24.711800] E [MSGID: 113002] [posix.c:267:posix_lookup]
0-rhev_vms-posix: buf->ia_gfid is null for /gluster/brick/rhev_vms/.
shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177 [No data available]
[2018-06-07 06:36:24.711839] E [MSGID: 115050]
LOOKUP /.shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177
(be318638-e8a0-4c6d-977d-7a937aa84806/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177)
==> (No data available) [No data available]
if so what does it mean?
Dan
Post by Krutika Dhananjay
Thanks, I just sent http://review.gluster.org/#/c/15161/1 to reduce the
log-level to DEBUG. Let's see what the maintainers have to say. :)
-Krutika
On Tue, Aug 16, 2016 at 5:50 AM, David Gossage <
Post by David Gossage
Post by Krutika Dhananjay
No. The EEXIST errors are normal and can be ignored. This can happen
when multiple threads try to create the same
shard in parallel. Nothing wrong with that.
Other than they pop up as E errors making a user worry hehe
Is their a known bug filed against that or should I maybe create one to
see if we can get that sent to an informational level maybe?
Post by Krutika Dhananjay
-Krutika
On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <
On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
Post by David Gossage
Here is reply again just in case. I got quarantine message so not
sure if first went through or wll anytime soon. Brick logs weren't large
so Ill just include as text files this time
Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and
after restrating the complaining ovirt node I was able to migrate the 2 vm
with issues. So not sure why the mount got stale, but I imagine that one
node couldn't see the new image files after that had occurred?
Still getting a few sporadic errors, but seems much fewer than before
and never get any corresponding notices in any other log files
[2016-08-15 13:40:31.510798] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
failed [File exists]
[2016-08-15 13:40:31.522067] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
failed [File exists]
[2016-08-15 17:47:06.375708] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
failed [File exists]
[2016-08-15 17:47:26.435198] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723
failed [File exists]
[2016-08-15 17:47:06.405481] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
failed [File exists]
[2016-08-15 17:47:26.464542] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723
failed [File exists]
[2016-08-15 18:46:47.187967] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.739
failed [File exists]
[2016-08-15 18:47:41.414312] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779
failed [File exists]
[2016-08-15 18:47:41.450470] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779
failed [File exists]
Post by David Gossage
As a safety precaution, the University of South Carolina quarantines .zip
and .docm files sent via email. If this is a legitimate attachment <
file will be released from quarantine and delivered.
On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay <
Post by Krutika Dhananjay
1. Could you share the output of `gluster volume heal <VOL> info`?
Results were same moments after issue occurred as well
Brick ccgl1.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl2.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Brick ccgl4.gl.local:/gluster1/BRICK1/1
Status: Connected
Number of entries: 0
Post by Krutika Dhananjay
2. `gluster volume info`
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
cluster.locking-scheme: granular
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.strict-write-ordering: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
server.allow-insecure: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: on
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 64MB
diagnostics.brick-log-level: WARNING
Post by Krutika Dhananjay
3. fuse mount logs of the affected volume(s)?
[2016-08-12 21:34:19.518511] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519115] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519203] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.519226] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.520737] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
[2016-08-12 21:34:19.521393] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.522269] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845
fd=0x7f00a80bde58 (Stale file handle)
[2016-08-12 21:34:19.521296] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 21:34:19.521357] W [MSGID: 114031]
remote operation failed [No such file or directory]
[2016-08-12 22:15:08.337528] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:12.240026] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:11.105593] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
[2016-08-12 22:15:14.772713] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
/7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
(hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
4. glustershd logs
Nothing recent same on all 3 storage nodes
[2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-08-11 08:14:03.683287] I [MSGID: 100011]
[glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume
file from server...
[2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Post by Krutika Dhananjay
5. Brick logs
Their have been some error in brick logs I hadn't noticed
occurring. I've zip'd and attached all 3 nodes logs, but from this snippet
on one node none of them seem to coincide with the time window when
migration had issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard
refers to an image for a different vm than one I had issues with as well.
Maybe gluster is trying to do some sort of make shard test before writing
out changes that would go to that image and that shard file?
[2016-08-12 18:48:22.463628] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
failed [File exists]
[2016-08-12 18:48:24.553455] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
[2016-08-12 18:49:16.065502] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated
5 times between [2016-08-12 18:48:22.463628] and [2016-08-12
18:48:22.514777]
[2016-08-12 18:48:24.581216] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated
5 times between [2016-08-12 18:49:16.065502] and [2016-08-12
18:49:16.107746]
[2016-08-12 19:23:40.964678] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301
failed [File exists]
[2016-08-12 20:00:33.498751] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-12 20:00:33.530938] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
failed [File exists]
[2016-08-13 01:47:23.338036] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
failed [File exists]
The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated
16 times between [2016-08-13 01:47:23.338036] and [2016-08-13
01:47:23.380980]
[2016-08-13 01:48:02.224494] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211
failed [File exists]
[2016-08-13 01:48:42.266148] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177
failed [File exists]
[2016-08-13 01:49:09.717434] E [MSGID: 113022]
[posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178
failed [File exists]
Post by Krutika Dhananjay
-Krutika
On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
Post by David Gossage
Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by
healing files in the gluster cluster, though I attribute my problems to
problems with the storage fabric. See if 'gluster volume heal $VOL info'
indicates files that are being healed, and if those reduce in number, can
the VM start?
I haven't had any files in a state of being healed according to
either of the 3 storage nodes.
I shut down one VM that has been around awhile a moment ago then
told it to start on the one ovirt server that complained previously. It
ran fine, and I was able to migrate it off and on the host no issues.
I told one of the new VM's to migrate to the one node and within
seconds it paused from unknown storage errors no shards showing heals
nothing with an error on storage node. Same stale file handle issues.
I'll probably put this node in maintenance later and reboot it.
Other than that I may re-clone those 2 reccent VM's. maybe images just got
corrupted though why it would only fail on one node of 3 if image was bad
not sure.
Dan
Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
Post by David Gossage
Figure I would repost here as well. one client out of 3
complaining of stale file handles on a few new VM's I migrated over. No
errors on storage nodes just client. Maybe just put that one in
maintenance and restart gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with
a 3 replicate gluster 3.7.14 starting a VM i just copied in on one node of
the 3 gets the following errors. The other 2 the vm starts fine. All
ovirt and gluster are centos 7 based. VM on start of the one node it tries
to default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs
.shard or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the
bootable drive of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
Unreadable subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Loading...