[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only

Discussion:

David Gossage

2016-08-11 11:52:14 UTC

Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?

*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284

---------- Forwarded message ----------
From: David Gossage <***@carouselchecks.com>
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
To: users <***@ovirt.org>

Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.

gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log

dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard or
images/

7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive of
the vm

[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation
3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation
3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale
file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation
3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2: remote operation failed [No such file or directory]

I attached vdsm.log starting from when I spun up vm on offending node

Dan Lavu

2016-08-12 21:25:06 UTC

Permalink

David,

I'm seeing similar behavior in my lab, but it has been caused by healing
files in the gluster cluster, though I attribute my problems to problems
with the storage fabric. See if 'gluster volume heal $VOL info' indicates
files that are being healed, and if those reduce in number, can the VM
start?

Dan

Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard or
images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive of
the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users

David Gossage

2016-08-12 21:40:13 UTC

Permalink

Post by Dan Lavu
David,
I'm seeing similar behavior in my lab, but it has been caused by healing
files in the gluster cluster, though I attribute my problems to problems
with the storage fabric. See if 'gluster volume heal $VOL info' indicates
files that are being healed, and if those reduce in number, can the VM
start?

I haven't had any files in a state of being healed according to either of
the 3 storage nodes.

I shut down one VM that has been around awhile a moment ago then told it to
start on the one ovirt server that complained previously. It ran fine, and
I was able to migrate it off and on the host no issues.

I told one of the new VM's to migrate to the one node and within seconds it
paused from unknown storage errors no shards showing heals nothing with an
error on storage node. Same stale file handle issues.

I'll probably put this node in maintenance later and reboot it. Other than
that I may re-clone those 2 reccent VM's. maybe images just got corrupted
though why it would only fail on one node of 3 if image was bad not sure.

Dan

Post by Dan Lavu
On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <

Post by David Gossage
Figure I would repost here as well. one client out of 3 complaining of
stale file handles on a few new VM's I migrated over. No errors on storage
nodes just client. Maybe just put that one in maintenance and restart
gluster mount?
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
---------- Forwarded message ----------
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
replicate gluster 3.7.14 starting a VM i just copied in on one node of the
3 gets the following errors. The other 2 the vm starts fine. All ovirt
and gluster are centos 7 based. VM on start of the one node it tries to
default to on its own accord immediately puts into paused for unknown
reason. Telling it to start on different node starts ok. node with issue
already has 5 VMs running fine on it same gluster storage plus the hosted
engine on different volume.
gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log
dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard
or images/
7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive
of the vm
[2016-08-11 04:31:39.982952] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040]
(null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
subvolume -1 found with event generation 3 for gfid
dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote
operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031]
[client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote
operation failed [No such file or directory]
I attached vdsm.log starting from when I spun up vm on offending node
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users

Krutika Dhananjay

2016-08-13 05:26:20 UTC

Permalink

1. Could you share the output of `gluster volume heal <VOL> info`?
2. `gluster volume info`
3. fuse mount logs of the affected volume(s)?
4. glustershd logs
5. Brick logs

-Krutika