Discussion:
[Gluster-users] VM going down
Alessandro Briosi
2017-05-08 10:13:51 UTC
Permalink
Hi all,
I have sporadic VM going down which files are on gluster FS.

If I look at the gluster logs the only events I find are:
/var/log/glusterfs/bricks/data-brick2-brick.log

[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
66d9eefb-ee55-40ad-9f44-c55d1e809006 held by {client=0x7f4c7c004880,
pid=0 lk-owner=5c7099efc97f0000}
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
a8d82b3d-1cf9-45cf-9858-d8546710b49c held by {client=0x7f4c840f31d0,
pid=0 lk-owner=5c7019fac97f0000}
[2017-05-08 09:51:17.661835] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-2.qcow2
[2017-05-08 09:51:17.661838] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-1.qcow2
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 10:01:06.210392] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
[2017-05-08 10:01:06.237463] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(a8d82b3d-1cf9-45cf-9858-d8546710b49c) ==> (No such device or address)
[No such device or address]
[2017-05-08 10:01:07.019974] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:07:3687-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:07.041967] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 19 length 859136720896 [No such
device or address]
[2017-05-08 10:01:07.041992] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(66d9eefb-ee55-40ad-9f44-c55d1e809006) ==> (No such device or address)
[No such device or address]

The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .

This is not the first time that this happened, and I do not see any
problems with networking or the hosts.

Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one too)

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

Any hint on how to dig more deeply into the reason would be greatly
appreciated.

Alessandro
Krutika Dhananjay
2017-05-08 10:38:25 UTC
Permalink
The newly introduced "SEEK" fop seems to be failing at the bricks.

Adding Niels for his inputs/help.

-Krutika
Post by Alessandro Briosi
Hi all,
I have sporadic VM going down which files are on gluster FS.
/var/log/glusterfs/bricks/data-brick2-brick.log
[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
66d9eefb-ee55-40ad-9f44-c55d1e809006 held by {client=0x7f4c7c004880,
pid=0 lk-owner=5c7099efc97f0000}
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
a8d82b3d-1cf9-45cf-9858-d8546710b49c held by {client=0x7f4c840f31d0,
pid=0 lk-owner=5c7019fac97f0000}
[2017-05-08 09:51:17.661835] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-2.qcow2
[2017-05-08 09:51:17.661838] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-1.qcow2
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 10:01:06.210392] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
[2017-05-08 10:01:06.237463] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(a8d82b3d-1cf9-45cf-9858-d8546710b49c) ==> (No such device or address)
[No such device or address]
[2017-05-08 10:01:07.019974] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:07:3687-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:07.041967] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 19 length 859136720896 [No such
device or address]
[2017-05-08 10:01:07.041992] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(66d9eefb-ee55-40ad-9f44-c55d1e809006) ==> (No such device or address)
[No such device or address]
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one too)
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Alessandro
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Jesper Led Lauridsen TS Infra server
2017-05-08 10:57:16 UTC
Permalink
I dont know if this has any relation to you issue. But I have seen several times during gluster healing that my wm’s fail or are marked unresponsive in rhev. My conclusion is that the load gluster puts on the wm-images during checksum while healing, result in to much latency and wm’s fail.

My plans is to try using sharding, so the wm-images/files are split into smaller files, changing the number of allowed concurrent heals ‘cluster.background-self-heal-count’ and disabling ‘cluster.self-heal-daemon’.

/Jesper

Fra: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] PÃ¥ vegne af Krutika Dhananjay
Sendt: 8. maj 2017 12:38
Til: Alessandro Briosi <***@metalit.com>; de Vos, Niels <***@redhat.com>
Cc: gluster-users <gluster-***@gluster.org>
Emne: Re: [Gluster-users] VM going down

The newly introduced "SEEK" fop seems to be failing at the bricks.
Adding Niels for his inputs/help.

-Krutika

On Mon, May 8, 2017 at 3:43 PM, Alessandro Briosi <***@metalit.com<mailto:***@metalit.com>> wrote:
Hi all,
I have sporadic VM going down which files are on gluster FS.

If I look at the gluster logs the only events I find are:
/var/log/glusterfs/bricks/data-brick2-brick.log

[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
66d9eefb-ee55-40ad-9f44-c55d1e809006 held by {client=0x7f4c7c004880,
pid=0 lk-owner=5c7099efc97f0000}
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
a8d82b3d-1cf9-45cf-9858-d8546710b49c held by {client=0x7f4c840f31d0,
pid=0 lk-owner=5c7019fac97f0000}
[2017-05-08 09:51:17.661835] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-2.qcow2
[2017-05-08 09:51:17.661838] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-1.qcow2
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 10:01:06.210392] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
[2017-05-08 10:01:06.237463] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(a8d82b3d-1cf9-45cf-9858-d8546710b49c) ==> (No such device or address)
[No such device or address]
[2017-05-08 10:01:07.019974] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:07:3687-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:07.041967] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 19 length 859136720896 [No such
device or address]
[2017-05-08 10:01:07.041992] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(66d9eefb-ee55-40ad-9f44-c55d1e809006) ==> (No such device or address)
[No such device or address]

The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .

This is not the first time that this happened, and I do not see any
problems with networking or the hosts.

Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one too)

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

Any hint on how to dig more deeply into the reason would be greatly
appreciated.

Alessandro
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users
Alessandro Briosi
2017-05-08 13:49:55 UTC
Permalink
Post by Jesper Led Lauridsen TS Infra server
I dont know if this has any relation to you issue. But I have seen
several times during gluster healing that my wm’s fail or are marked
unresponsive in rhev. My conclusion is that the load gluster puts on
the wm-images during checksum while healing, result in to much latency
and wm’s fail.
My plans is to try using sharding, so the wm-images/files are split
into smaller files, changing the number of allowed concurrent heals
‘cluster.background-self-heal-count’ and disabling
‘cluster.self-heal-daemon’.
The thing is that there are no heal processes running, no log entries
either.
Few days ago I had a failure and the heal process started and finished
without any problems.

I do not use sharding yet.

Alessandro
Alessandro Briosi
2017-05-09 07:29:52 UTC
Permalink
Post by Alessandro Briosi
Post by Jesper Led Lauridsen TS Infra server
I dont know if this has any relation to you issue. But I have seen
several times during gluster healing that my wm’s fail or are marked
unresponsive in rhev. My conclusion is that the load gluster puts on
the wm-images during checksum while healing, result in to much
latency and wm’s fail.
My plans is to try using sharding, so the wm-images/files are split
into smaller files, changing the number of allowed concurrent heals
‘cluster.background-self-heal-count’ and disabling
‘cluster.self-heal-daemon’.
The thing is that there are no heal processes running, no log entries
either.
Few days ago I had a failure and the heal process started and finished
without any problems.
I do not use sharding yet.
Well, it happened again on a different volume and a different VM.

This time a self heal process was started.

Why is this happening? there are no network problems on the hosts and
they all do have bonded 2x1Gbit nics dedicated to gluster...

Is there any information I can give you to find out what happened?

This is the only mention about heal in the logs:
[2017-05-08 17:34:40.474774] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-datastore1-replicate-0:
Completed data selfheal on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f.
sources=[1] sinks=0 2

The VM went down 1 1/2 hour before:
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0
[2017-05-08 15:54:11.781840] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup on
/images/101/vm-101-disk-2.qcow2
[2017-05-08 15:54:11.781838] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore1-server: releasing lock on
bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by {client=0x7ffa7c0051f0,
pid=0 lk-owner=5c600023827f0
000}
[2017-05-08 15:54:11.781863] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup on
/images/101/vm-101-disk-1.qcow2
[2017-05-08 15:54:11.781947] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781971] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0


Any hint would be greatly apreciated.

Alessandro
Ravishankar N
2017-05-09 08:16:56 UTC
Permalink
Post by Alessandro Briosi
Post by Alessandro Briosi
Post by Jesper Led Lauridsen TS Infra server
I dont know if this has any relation to you issue. But I have seen
several times during gluster healing that my wm’s fail or are marked
unresponsive in rhev. My conclusion is that the load gluster puts on
the wm-images during checksum while healing, result in to much
latency and wm’s fail.
My plans is to try using sharding, so the wm-images/files are split
into smaller files, changing the number of allowed concurrent heals
‘cluster.background-self-heal-count’ and disabling
‘cluster.self-heal-daemon’.
The thing is that there are no heal processes running, no log entries
either.
Few days ago I had a failure and the heal process started and
finished without any problems.
I do not use sharding yet.
Well, it happened again on a different volume and a different VM.
This time a self heal process was started.
Why is this happening? there are no network problems on the hosts and
they all do have bonded 2x1Gbit nics dedicated to gluster...
Is there any information I can give you to find out what happened?
[2017-05-08 17:34:40.474774] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-datastore1-replicate-0: Completed data selfheal on
bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1] sinks=0 2
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0
[2017-05-08 15:54:11.781840] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup
on /images/101/vm-101-disk-2.qcow2
[2017-05-08 15:54:11.781838] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore1-server: releasing lock on
bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by {client=0x7ffa7c0051f0,
pid=0 lk-owner=5c600023827f0
000}
[2017-05-08 15:54:11.781863] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup
on /images/101/vm-101-disk-1.qcow2
[2017-05-08 15:54:11.781947] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection
srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781971] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection
srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0
Any hint would be greatly apreciated.
Can you share the log of the fuse mount
(/var/log/glusterfs/<path-to-mout-point.log> on which the VM was
running? When you say 'VM going down' , do you mean it paused/ became
unresponsive?
-Ravi
Post by Alessandro Briosi
Alessandro
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Alessandro Briosi
2017-05-09 08:49:28 UTC
Permalink
Post by Ravishankar N
Can you share the log of the fuse mount
(/var/log/glusterfs/<path-to-mout-point.log> on which the VM was
running? When you say 'VM going down' , do you mean it paused/ became
unresponsive?
This is the log. last host [1], previous host [2].
They are respectively of the hosts running the VM.

As you can see there are no events since 4th May

I mean the kvm process is not there anymore and the VM results in not
running (seems like it was "killed" or "shut off", thoght the only logs
I can see are about releasing the tap device for the VM network)

[1] https://pastebin.ca/3809999
[2] https://pastebin.ca/3810000

Thanks,
Alessandro
Alessandro Briosi
2017-05-08 13:52:45 UTC
Permalink
Post by Krutika Dhananjay
The newly introduced "SEEK" fop seems to be failing at the bricks.
Adding Niels for his inputs/help.
Don't know if this is related though the SEEK is done only when the VM
is started, not when it's suddenly shutdown.
Though it's an odd message (as the file really is there), the VM starts
correctly.

Alessandro
Niels de Vos
2017-05-09 14:10:18 UTC
Permalink
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
return with "No suck device or address" (ENXIO) in only one case:

ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.

This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.

...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one too)
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.

We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.

HTH,
Niels
Alessandro Briosi
2017-05-09 14:59:51 UTC
Permalink
Post by Niels de Vos
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one too)
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
Well, I have not really clear the internals of gluster, but the arbiter
is not the host where the vm is running.
If gluster is aware of the arbiter, it should not look for data on that
brick beside metadata and "quorum".

Also the seek errors where there before when there was no arbiter (only
2 replica).
And finally seek error is triggered when the VM is started (at least the
one in the logs).


Alessandro
Lindsay Mathieson
2017-05-09 21:41:26 UTC
Permalink
Post by Alessandro Briosi
Also the seek errors where there before when there was no arbiter
(only 2 replica).
And finally seek error is triggered when the VM is started (at least
the one in the logs).
Could there be a corruption problem in the qcow2 image? have you run
"qemu-img check" against it?
--
Lindsay Mathieson
Alessandro Briosi
2017-05-10 23:51:48 UTC
Permalink
Post by Lindsay Mathieson
Post by Alessandro Briosi
Also the seek errors where there before when there was no arbiter
(only 2 replica).
And finally seek error is triggered when the VM is started (at least
the one in the logs).
Could there be a corruption problem in the qcow2 image? have you run
"qemu-img check" against it?
Not sure about this. But does not seems so.
The qemu-img check reports everything is ok with disks.

***@srvpve1:/mnt/pve/datastore2/images/201# qemu-img check
vm-201-disk-1.qcow2
No errors were found on the image.
655360/655360 = 100.00% allocated, 1.11% fragmented, 0.00% compressed
clusters
Image end offset: 42969661440


On one it reports about Leaked clusters but I don't think this might
cause the problem (or not?)

***@srvpve1:/mnt/pve/datastore1/images/101# qemu-img check
vm-101-disk-1.qcow2
Leaked cluster 409006 refcount=1 reference=0
Leaked cluster 624338 refcount=1 reference=0
Leaked cluster 791103 refcount=1 reference=0

3 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
8192000/8192000 = 100.00% allocated, 1.24% fragmented, 0.00% compressed
clusters
Image end offset: 539424129024
Lindsay Mathieson
2017-05-11 04:38:34 UTC
Permalink
On 11/05/2017 9:51 AM, Alessandro Briosi wrote:

On one it reports about Leaked clusters but I don't think this might cause
the problem (or not?)


Should be fine
--
Lindsay Mathieson
Lindsay Mathieson
2017-05-11 09:31:25 UTC
Permalink
Post by Alessandro Briosi
On one it reports about Leaked clusters but I don't think this might
cause the problem (or not?)
Should be fine
--
Lindsay Mathieson
Pranith Kumar Karampuri
2017-05-10 10:38:22 UTC
Permalink
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
inode-read fops don't get sent to arbiter brick. So this won't happen.
Post by Alessandro Briosi
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
HTH,
Niels
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Niels de Vos
2017-05-10 13:41:34 UTC
Permalink
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
inode-read fops don't get sent to arbiter brick. So this won't happen.
Yes, I see that the arbiter xlator returns on reads without going to the
bricks. Should that not be done for seek as well? It's the first time I
actually looked at the code of the arbiter xlator, so I might well be
misunderstanding how it works :)

Thanks,
Niels
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
HTH,
Niels
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Pranith Kumar Karampuri
2017-05-10 15:38:03 UTC
Permalink
Post by Niels de Vos
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at
~9.51
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different
one
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
inode-read fops don't get sent to arbiter brick. So this won't happen.
Yes, I see that the arbiter xlator returns on reads without going to the
bricks. Should that not be done for seek as well? It's the first time I
actually looked at the code of the arbiter xlator, so I might well be
misunderstanding how it works :)
inode-read fops are the fops which read some information from the inode.
Like stat/getxattr/read. Even seek falls in that category. It is not sent
on arbiter brick...
Post by Niels de Vos
Thanks,
Niels
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
HTH,
Niels
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
--
Pranith
Niels de Vos
2017-05-11 12:19:24 UTC
Permalink
Post by Pranith Kumar Karampuri
Post by Niels de Vos
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at
~9.51
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different
one
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
inode-read fops don't get sent to arbiter brick. So this won't happen.
Yes, I see that the arbiter xlator returns on reads without going to the
bricks. Should that not be done for seek as well? It's the first time I
actually looked at the code of the arbiter xlator, so I might well be
misunderstanding how it works :)
inode-read fops are the fops which read some information from the inode.
Like stat/getxattr/read. Even seek falls in that category. It is not sent
on arbiter brick...
What confuses me is that the arbiter xlator defines the following FOPs
in xlators/features/arbiter/src/arbiter.c:

struct xlator_fops fops = {
.lookup = arbiter_lookup,
.readv = arbiter_readv,
.truncate = arbiter_truncate,
.writev = arbiter_writev,
.ftruncate = arbiter_ftruncate,
.fallocate = arbiter_fallocate,
.discard = arbiter_discard,
.zerofill = arbiter_zerofill,
};


To go back to the error message:

[posix.c:1079:posix_seek] 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such device or address]

We need to know on which brick this occurs to confirm that is was not
sent on the arbiter brick somehow.

Thanks,
Niels
Ravishankar N
2017-05-11 12:35:59 UTC
Permalink
Post by Niels de Vos
Post by Pranith Kumar Karampuri
Post by Niels de Vos
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at
~9.51
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different
one
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
inode-read fops don't get sent to arbiter brick. So this won't happen.
Yes, I see that the arbiter xlator returns on reads without going to the
bricks. Should that not be done for seek as well? It's the first time I
actually looked at the code of the arbiter xlator, so I might well be
misunderstanding how it works :)
inode-read fops are the fops which read some information from the inode.
Like stat/getxattr/read. Even seek falls in that category. It is not sent
on arbiter brick...
What confuses me is that the arbiter xlator defines the following FOPs
AFR has a list of readable subvols on which all read related FOPS are
wound. For arbiter volumes, we mark the arbiter as non-readable during
lookup cbk.
So any read FOP is not wound to arbiter anymore. This change was made at
a later stage after arbiter_readv was coded initially to send an error.
So in the current code, arbiter_readv should never get hit.
Post by Niels de Vos
struct xlator_fops fops = {
.lookup = arbiter_lookup,
.readv = arbiter_readv,
.truncate = arbiter_truncate,
.writev = arbiter_writev,
.ftruncate = arbiter_ftruncate,
.fallocate = arbiter_fallocate,
.discard = arbiter_discard,
.zerofill = arbiter_zerofill,
};
[posix.c:1079:posix_seek] 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such device or address]
We need to know on which brick this occurs to confirm that is was not
sent on the arbiter brick somehow.
This is what Alessandro said earlier in the thread:

"Also the seek errors where there before when there was no arbiter (only
2 replica)."
Post by Niels de Vos
Thanks,
Niels
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Niels de Vos
2017-05-11 15:18:45 UTC
Permalink
Post by Niels de Vos
Post by Pranith Kumar Karampuri
Post by Niels de Vos
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at
~9.51
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
Post by Alessandro Briosi
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different
one
Post by Pranith Kumar Karampuri
Post by Alessandro Briosi
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
inode-read fops don't get sent to arbiter brick. So this won't happen.
Yes, I see that the arbiter xlator returns on reads without going to the
bricks. Should that not be done for seek as well? It's the first time I
actually looked at the code of the arbiter xlator, so I might well be
misunderstanding how it works :)
inode-read fops are the fops which read some information from the inode.
Like stat/getxattr/read. Even seek falls in that category. It is not sent
on arbiter brick...
What confuses me is that the arbiter xlator defines the following FOPs
AFR has a list of readable subvols on which all read related FOPS are wound.
For arbiter volumes, we mark the arbiter as non-readable during lookup cbk.
So any read FOP is not wound to arbiter anymore. This change was made at a
later stage after arbiter_readv was coded initially to send an error. So in
the current code, arbiter_readv should never get hit.
Aha! Thanks, that explains it well.
Post by Niels de Vos
struct xlator_fops fops = {
.lookup = arbiter_lookup,
.readv = arbiter_readv,
.truncate = arbiter_truncate,
.writev = arbiter_writev,
.ftruncate = arbiter_ftruncate,
.fallocate = arbiter_fallocate,
.discard = arbiter_discard,
.zerofill = arbiter_zerofill,
};
[posix.c:1079:posix_seek] 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such device or address]
We need to know on which brick this occurs to confirm that is was not
sent on the arbiter brick somehow.
"Also the seek errors where there before when there was no arbiter (only 2
replica)."
Ok, I missed that detail. We then just need to figure out why QEMU and
FUSE try to do an lseek() with an offset of 42957209600 while the file
is not that large...

Any ideas how that can happen?

Niels
Krutika Dhananjay
2017-05-11 07:05:42 UTC
Permalink
Niels,

Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.

Copy-pasting volume-info output from the first mail:

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet


-Krutika
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
HTH,
Niels
Alessandro Briosi
2017-05-11 07:14:11 UTC
Permalink
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Hi,
I know sharding is not enabled, I had the impression it's not stable
enough to be used yet.

Beside that as said the seek error is triggered when the VM is started
up not when it's suddenly stopped.

Though the problem still persists, yesterday I had a new down of one VM.

My next step will be to start qemu with debugging to see where it
crashes (or if it shutsdown by itself for some reasons)

If anybody has any other hint I would greatly appreciate.
If needed I might provide all the /var/log of the host/s.

Alessandro
Niels de Vos
2017-05-11 12:09:46 UTC
Permalink
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).

I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding *and* (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.

HTH,
Niels
Post by Krutika Dhananjay
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
-Krutika
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
HTH,
Niels
Alessandro Briosi
2017-05-11 13:49:27 UTC
Permalink
Post by Niels de Vos
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).
I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding *and* (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.
Where is stated that arbiter should be used with sharding?
Or that arbiter functionality without sharding is still in "testing" phase?
I thought that having a 3 replica on a 3 nodes cluster would have been a
waste of space. (I can only support loosing 1 host at a time, and that's
fine.)

Anyway I had this happen also before with the same VM when there was no
arbiter, and I thought it was for some strange reason a "quorum" thing
which would trigger the file not beeing available in gluster, thogh
there were no clues in the logs.
So I added the arbiter brick, but it happened again last week.

The first VM I reported about going down was created on a volume with
arbiter enabled from the start, so I dubt it's something to do with arbiter.

I think it might have something to do with a load problem ? Though the
hosts are really not beeing used that much.

Anyway this is a brief description of my setup.

3 dell servers with RAID 10 SAS Disks
each server has 2 bonded 1Gbps ethernets dedicated to gluster (2
dedicated to the proxmox cluster and 2 for comunication with the hosts
on the LAN) (each on it's VLAN in the switch)
Also jumbo frames are enabled on ethernets and switches.

each server is a proxmox host which has gluster installed and configured
as server and client.

The RAID has a LVM thin provisioned which is divided into 3 bricks (2
big for the data and 1 small for the arbiter).
each Thin LVM is XFS formatted and mounted as brick.
There are 3 volumes configured which replicate 3 with arbiter (so 2
really holding the data).
Volumes are:
datastore1: data on srv1 and srv2, arbiter srv3
datastore2: data on srv2 and srv3, arbiter srv1
datastore3: data on srv1 and srv3, arbiter srv2

On each datastore basically there is a main VM (plus some others which
though are not so important). (3 VM are mainly important)

datastore1 was converted from 2 replica to 3 replica with arbiter, the
other 2 were created as described.

The VM on the first datastore crashed more times (even where there was
no arbiter, which I thought for some reason there was a split brain
which gluster could not handle).

Last week also the 2nd VM (on datastore2) crashed, and that's when I
started the thread (before as there were no special errors logged I
thought it could have been caused by something in the VM)

Till now the 3rd VM never crashed.

Still any help on this would be really appreciated.

I know it could also be a problem somewhere else, but I have other
setups without gluster which simply work.
That's why I want to start the VM with gdb, to check next time why the
kvm process shuts down.

Alessandro
Pranith Kumar Karampuri
2017-05-11 14:15:09 UTC
Permalink
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).
I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding **and** (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.
Where is stated that arbiter should be used with sharding?
This information is inaccurate. arbiter can be used independent of sharding.
Post by Krutika Dhananjay
Or that arbiter functionality without sharding is still in "testing" phase?
I thought that having a 3 replica on a 3 nodes cluster would have been a
waste of space. (I can only support loosing 1 host at a time, and that's
fine.)
Anyway I had this happen also before with the same VM when there was no
arbiter, and I thought it was for some strange reason a "quorum" thing
which would trigger the file not beeing available in gluster, thogh there
were no clues in the logs.
So I added the arbiter brick, but it happened again last week.
The first VM I reported about going down was created on a volume with
arbiter enabled from the start, so I dubt it's something to do with arbiter.
I think it might have something to do with a load problem ? Though the
hosts are really not beeing used that much.
Anyway this is a brief description of my setup.
3 dell servers with RAID 10 SAS Disks
each server has 2 bonded 1Gbps ethernets dedicated to gluster (2 dedicated
to the proxmox cluster and 2 for comunication with the hosts on the LAN)
(each on it's VLAN in the switch)
Also jumbo frames are enabled on ethernets and switches.
each server is a proxmox host which has gluster installed and configured
as server and client.
The RAID has a LVM thin provisioned which is divided into 3 bricks (2 big
for the data and 1 small for the arbiter).
each Thin LVM is XFS formatted and mounted as brick.
There are 3 volumes configured which replicate 3 with arbiter (so 2 really
holding the data).
datastore1: data on srv1 and srv2, arbiter srv3
datastore2: data on srv2 and srv3, arbiter srv1
datastore3: data on srv1 and srv3, arbiter srv2
On each datastore basically there is a main VM (plus some others which
though are not so important). (3 VM are mainly important)
datastore1 was converted from 2 replica to 3 replica with arbiter, the
other 2 were created as described.
The VM on the first datastore crashed more times (even where there was no
arbiter, which I thought for some reason there was a split brain which
gluster could not handle).
Last week also the 2nd VM (on datastore2) crashed, and that's when I
started the thread (before as there were no special errors logged I thought
it could have been caused by something in the VM)
Till now the 3rd VM never crashed.
Still any help on this would be really appreciated.
I know it could also be a problem somewhere else, but I have other setups
without gluster which simply work.
That's why I want to start the VM with gdb, to check next time why the kvm
process shuts down.
Alessandro
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Alessandro Briosi
2017-05-11 16:01:30 UTC
Permalink
Post by Alessandro Briosi
Post by Niels de Vos
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).
I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding **and** (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.
Where is stated that arbiter should be used with sharding?
This information is inaccurate. arbiter can be used independent of sharding.
Thanks, this reassures me I do have a supported setup.

Alessandro
Niels de Vos
2017-05-12 09:36:28 UTC
Permalink
Post by Alessandro Briosi
Post by Niels de Vos
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).
I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding *and* (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.
Where is stated that arbiter should be used with sharding?
Or that arbiter functionality without sharding is still in "testing" phase?
I thought that having a 3 replica on a 3 nodes cluster would have been a
waste of space. (I can only support loosing 1 host at a time, and that's
fine.)
There is no "arbiter should be used with sharding", our recommendations
are to use sharding for VM workloads, with an optional arbiter. But we
still expect VMs on non-sharded volumes to work just fine, with or
without arbiter.
Post by Alessandro Briosi
Anyway I had this happen also before with the same VM when there was no
arbiter, and I thought it was for some strange reason a "quorum" thing
which would trigger the file not beeing available in gluster, thogh
there were no clues in the logs.
So I added the arbiter brick, but it happened again last week.
If it is always the same VM, I wonder if there could be a small
filesystem corruption in that VM? Were there any actions done on the
storage of that VM, like resizing the block-device (VM image) or
something like that? Systems can sometimes try to access data outside of
the block device when it was resized, but the filesystem on the block
device was not. This would 'trick' the filesystem in thinking it has
more space to access than the block device has. If the filesystem access
in the VM is 'passed the block device', and this gets through to Gluster
which does a seek with that too large offset, the log you posted would
be a result.
Post by Alessandro Briosi
The first VM I reported about going down was created on a volume with
arbiter enabled from the start, so I dubt it's something to do with arbiter.
I think it might have something to do with a load problem ? Though the
hosts are really not beeing used that much.
Anyway this is a brief description of my setup.
3 dell servers with RAID 10 SAS Disks
each server has 2 bonded 1Gbps ethernets dedicated to gluster (2
dedicated to the proxmox cluster and 2 for comunication with the hosts
on the LAN) (each on it's VLAN in the switch)
Also jumbo frames are enabled on ethernets and switches.
each server is a proxmox host which has gluster installed and configured
as server and client.
Do you know how proxmox accesses the VM images? Does it use QEMU+gfapi
or is it all over a FUSE mount? New versions of QEMU+gfapi have seek
support, and only new versions of the Linux kernel support seek over
FUSE. In order to track where the problem may be, we need to look into
the client (QEMU or FUSE) that does the seek with an invalid offset.
Post by Alessandro Briosi
The RAID has a LVM thin provisioned which is divided into 3 bricks (2
big for the data and 1 small for the arbiter).
each Thin LVM is XFS formatted and mounted as brick.
There are 3 volumes configured which replicate 3 with arbiter (so 2
really holding the data).
datastore1: data on srv1 and srv2, arbiter srv3
datastore2: data on srv2 and srv3, arbiter srv1
datastore3: data on srv1 and srv3, arbiter srv2
On each datastore basically there is a main VM (plus some others which
though are not so important). (3 VM are mainly important)
datastore1 was converted from 2 replica to 3 replica with arbiter, the
other 2 were created as described.
The VM on the first datastore crashed more times (even where there was
no arbiter, which I thought for some reason there was a split brain
which gluster could not handle).
Last week also the 2nd VM (on datastore2) crashed, and that's when I
started the thread (before as there were no special errors logged I
thought it could have been caused by something in the VM)
Till now the 3rd VM never crashed.
Still any help on this would be really appreciated.
I know it could also be a problem somewhere else, but I have other
setups without gluster which simply work.
That's why I want to start the VM with gdb, to check next time why the
kvm process shuts down.
If the problem in the log from the brick is any clue, I would say that
QEMU aborts when the seek failed. Somehow the seek got executed with a
too high offset (passed the size of the file), and that returned an
error.

We'll need to find out what makes QEMU (or FUSE) think the file is
larger than it actually is on the brick. If you have a way of reprodcing
it, you could enable more verbose logging on the client side
(diagnostics.client-log-level volume option), but if you run many VMs,
that may accumilate a lot of logs.

You probably should open a bug so that we have all the troubleshooting
and debugging details in one location. Once we find the problem we can
move the bug to the right component.
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS

HTH,
Niels
Pranith Kumar Karampuri
2017-05-11 14:10:15 UTC
Permalink
Post by Niels de Vos
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).
I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding *and* (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.
That is not true. Both are independent. There are quite a few questions we
answered in the past ~1 year on gluster-users which don't use
sharding+arbiter but plain old 2+1 configuration.
Post by Niels de Vos
HTH,
Niels
Post by Krutika Dhananjay
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
-Krutika
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
Post by Krutika Dhananjay
Post by Alessandro Briosi
Post by Alessandro Briosi
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at
~9.51
Post by Krutika Dhananjay
Post by Alessandro Briosi
Post by Alessandro Briosi
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different
one
Post by Krutika Dhananjay
Post by Alessandro Briosi
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
HTH,
Niels
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Niels de Vos
2017-05-12 09:13:51 UTC
Permalink
Post by Pranith Kumar Karampuri
Post by Niels de Vos
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).
I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding *and* (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.
That is not true. Both are independent. There are quite a few questions we
answered in the past ~1 year on gluster-users which don't use
sharding+arbiter but plain old 2+1 configuration.
Yes, of course. But that does not take away the *advise* to use
sharding (+arbiter as option) for VM workloads. I am not aware of
regular testing that may use seek on 2+1 configurations. The oVirt team
that runs their regular tests, have sharding enabled, I think?

Seek is only usable in two occasions (for VMs):
1. QEMU + libgfapi integration
2. FUSE on a recent kernel (not available in CentOS/RHEL/... yet)

Given this, there are not many deployments that could run into problems
with seek. Simply because the functionlity is not (by default) available
in enterprise distributions.

It is well possible that there is a bug in the seek implementation, and
because of our default testing (with shard) on enterprise distributions
(no seek for FUSE), we may have not hit it yet.

From the information in this (sub)thread, I can not see what client
(QEMU+gfapi or FUSE) is used, or which brick returns the seek error. I
can recommend to enable sharding as a workaround, and would expect not
to see any problems with seek anymore (because sharding blocks those
requests).

Niels
Post by Pranith Kumar Karampuri
Post by Niels de Vos
HTH,
Niels
Post by Krutika Dhananjay
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
-Krutika
Post by Alessandro Briosi
...
Post by Alessandro Briosi
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
Post by Alessandro Briosi
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
Post by Krutika Dhananjay
Post by Alessandro Briosi
Post by Alessandro Briosi
device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is
beyond the end of the file.
This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.
...
Post by Alessandro Briosi
The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at
~9.51
Post by Krutika Dhananjay
Post by Alessandro Briosi
Post by Alessandro Briosi
UTC and was started at ~10.01 UTC) .
This is not the first time that this happened, and I do not see any
problems with networking or the hosts.
Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different
one
Post by Krutika Dhananjay
Post by Alessandro Briosi
too)
Post by Alessandro Briosi
Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
Any hint on how to dig more deeply into the reason would be greatly
appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.
We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.
HTH,
Niels
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Alessandro Briosi
2017-05-12 10:09:43 UTC
Permalink
Post by Niels de Vos
Post by Alessandro Briosi
Post by Niels de Vos
Post by Krutika Dhananjay
Niels,
Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.
Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).
I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding *and* (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.
Where is stated that arbiter should be used with sharding?
Or that arbiter functionality without sharding is still in "testing" phase?
I thought that having a 3 replica on a 3 nodes cluster would have been a
waste of space. (I can only support loosing 1 host at a time, and that's
fine.)
There is no "arbiter should be used with sharding", our recommendations
are to use sharding for VM workloads, with an optional arbiter. But we
still expect VMs on non-sharded volumes to work just fine, with or
without arbiter.
Sure and I'd like to use it. Though as there were corruption bug
recently I preferred not using it yet.
Post by Niels de Vos
Post by Alessandro Briosi
Anyway I had this happen also before with the same VM when there was no
arbiter, and I thought it was for some strange reason a "quorum" thing
which would trigger the file not beeing available in gluster, thogh
there were no clues in the logs.
So I added the arbiter brick, but it happened again last week.
If it is always the same VM, I wonder if there could be a small
filesystem corruption in that VM? Were there any actions done on the
storage of that VM, like resizing the block-device (VM image) or
something like that? Systems can sometimes try to access data outside of
the block device when it was resized, but the filesystem on the block
device was not. This would 'trick' the filesystem in thinking it has
more space to access than the block device has. If the filesystem access
in the VM is 'passed the block device', and this gets through to Gluster
which does a seek with that too large offset, the log you posted would
be a result.
The problem was on only 1 VM, but now it extended to another one, that's
why I started reporting.
Post by Niels de Vos
Post by Alessandro Briosi
The first VM I reported about going down was created on a volume with
arbiter enabled from the start, so I dubt it's something to do with arbiter.
I think it might have something to do with a load problem ? Though the
hosts are really not beeing used that much.
Anyway this is a brief description of my setup.
3 dell servers with RAID 10 SAS Disks
each server has 2 bonded 1Gbps ethernets dedicated to gluster (2
dedicated to the proxmox cluster and 2 for comunication with the hosts
on the LAN) (each on it's VLAN in the switch)
Also jumbo frames are enabled on ethernets and switches.
each server is a proxmox host which has gluster installed and configured
as server and client.
Do you know how proxmox accesses the VM images? Does it use QEMU+gfapi
or is it all over a FUSE mount? New versions of QEMU+gfapi have seek
support, and only new versions of the Linux kernel support seek over
FUSE. In order to track where the problem may be, we need to look into
the client (QEMU or FUSE) that does the seek with an invalid offset.
it uses quem+gfapi afaik

-drive
file=gluster://srvpve1g/datastore1/images/101/vm-101-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,cache=none,aio=native,detect-zeroes=on
-device
virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100
Post by Niels de Vos
Post by Alessandro Briosi
The RAID has a LVM thin provisioned which is divided into 3 bricks (2
big for the data and 1 small for the arbiter).
each Thin LVM is XFS formatted and mounted as brick.
There are 3 volumes configured which replicate 3 with arbiter (so 2
really holding the data).
datastore1: data on srv1 and srv2, arbiter srv3
datastore2: data on srv2 and srv3, arbiter srv1
datastore3: data on srv1 and srv3, arbiter srv2
On each datastore basically there is a main VM (plus some others which
though are not so important). (3 VM are mainly important)
datastore1 was converted from 2 replica to 3 replica with arbiter, the
other 2 were created as described.
The VM on the first datastore crashed more times (even where there was
no arbiter, which I thought for some reason there was a split brain
which gluster could not handle).
Last week also the 2nd VM (on datastore2) crashed, and that's when I
started the thread (before as there were no special errors logged I
thought it could have been caused by something in the VM)
Till now the 3rd VM never crashed.
Still any help on this would be really appreciated.
I know it could also be a problem somewhere else, but I have other
setups without gluster which simply work.
That's why I want to start the VM with gdb, to check next time why the
kvm process shuts down.
If the problem in the log from the brick is any clue, I would say that
QEMU aborts when the seek failed. Somehow the seek got executed with a
too high offset (passed the size of the file), and that returned an
error.
We'll need to find out what makes QEMU (or FUSE) think the file is
larger than it actually is on the brick. If you have a way of reprodcing
it, you could enable more verbose logging on the client side
(diagnostics.client-log-level volume option), but if you run many VMs,
that may accumilate a lot of logs.
You probably should open a bug so that we have all the troubleshooting
and debugging details in one location. Once we find the problem we can
move the bug to the right component.
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
HTH,
Niels
The thing is that when the VM is down and I check the logs there's nothing.
Then when I start the VM the logs get populated with the seek error.

Anyway I'll open a bug for this.

Alessandro
Alessandro Briosi
2017-05-19 15:27:56 UTC
Permalink
Post by Alessandro Briosi
Post by Niels de Vos
You probably should open a bug so that we have all the troubleshooting
and debugging details in one location. Once we find the problem we can
move the bug to the right component.
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
HTH,
Niels
The thing is that when the VM is down and I check the logs there's nothing.
Then when I start the VM the logs get populated with the seek error.
Anyway I'll open a bug for this.
Ok, as it happened again I have opened a bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1452766

I now have started the vm with gdb (maybe I can find more information)

In the logs I still have "No such file or directory" which at this point
seems to be the culprit of this (?)

Alessandro
Alessandro Briosi
2017-05-25 06:57:33 UTC
Permalink
Post by Alessandro Briosi
Post by Alessandro Briosi
Post by Niels de Vos
You probably should open a bug so that we have all the troubleshooting
and debugging details in one location. Once we find the problem we can
move the bug to the right component.
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
HTH,
Niels
The thing is that when the VM is down and I check the logs there's nothing.
Then when I start the VM the logs get populated with the seek error.
Anyway I'll open a bug for this.
https://bugzilla.redhat.com/show_bug.cgi?id=1452766
I now have started the vm with gdb (maybe I can find more information)
In the logs I still have "No such file or directory" which at this
point seems to be the culprit of this (?)
Alessandro
It heppened again and now I have at least a gdb log which tells me where
the error is.

I've attached the log to the bug.

Logs strangely do not report any error, though the 2 VM disk files seem
to be going through a heal process:

Brick srvpve1g:/data/brick1/brick
/images/101/vm-101-disk-2.qcow2 - Possibly undergoing heal

/images/101/vm-101-disk-1.qcow2 - Possibly undergoing heal

Status: Connected
Number of entries: 2

Brick srvpve2g:/data/brick1/brick
/images/101/vm-101-disk-2.qcow2 - Possibly undergoing heal

/images/101/vm-101-disk-1.qcow2 - Possibly undergoing heal

Status: Connected
Number of entries: 2

Brick srvpve3g:/data/brick1/brick
/images/101/vm-101-disk-2.qcow2 - Possibly undergoing heal

/images/101/vm-101-disk-1.qcow2 - Possibly undergoing heal

Status: Connected
Number of entries: 2


I really have no clue on why this is happening.
Thanks for your help.

Alessandro
Alessandro Briosi
2017-05-25 14:14:54 UTC
Permalink
You'd want to see the client log. I'm not sure where proxmox
configures those to go.
Post by Alessandro Briosi
Post by Alessandro Briosi
Post by Niels de Vos
You probably should open a bug so that we have all the troubleshooting
and debugging details in one location. Once we find the problem we can
move the bug to the right component.
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
HTH,
Niels
The thing is that when the VM is down and I check the logs there's nothing.
Then when I start the VM the logs get populated with the seek error.
Anyway I'll open a bug for this.
https://bugzilla.redhat.com/show_bug.cgi?id=1452766
I now have started the vm with gdb (maybe I can find more
information)
In the logs I still have "No such file or directory" which at
this point seems to be the culprit of this (?)
Alessandro
It heppened again and now I have at least a gdb log which tells me
where the error is.
I've attached the log to the bug.
Logs strangely do not report any error, though the 2 VM disk files
Brick srvpve1g:/data/brick1/brick
/images/101/vm-101-disk-2.qcow2 - Possibly undergoing heal
/images/101/vm-101-disk-1.qcow2 - Possibly undergoing heal
Status: Connected
Number of entries: 2
Brick srvpve2g:/data/brick1/brick
/images/101/vm-101-disk-2.qcow2 - Possibly undergoing heal
/images/101/vm-101-disk-1.qcow2 - Possibly undergoing heal
Status: Connected
Number of entries: 2
Brick srvpve3g:/data/brick1/brick
/images/101/vm-101-disk-2.qcow2 - Possibly undergoing heal
/images/101/vm-101-disk-1.qcow2 - Possibly undergoing heal
Status: Connected
Number of entries: 2
I really have no clue on why this is happening.
Thanks for your help.
Alessandro
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
It's starting to be a bit frustrating as the VM now it crashed for thr
forth time...

I'm considering on moving the disks to a local storage untill the
problem is solved.


Buon lavoro.
/Alessandro Briosi/

*METAL.it Nord S.r.l.*
Via Maioliche 57/C - 38068 Rovereto (TN)
Tel.+39.0464.430130 - Fax +39.0464.437393
www.metalit.com
Alessandro Briosi
2017-05-25 13:47:02 UTC
Permalink
You'd want to see the client log. I'm not sure where proxmox
configures those to go.
This is all the content of glusterfs/cli.log (previous file cli.log.1 is
from 5 days ago)

[2017-05-25 06:21:30.736837] I [cli.c:728:main] 0-cli: Started running
gluster with version 3.8.11
[2017-05-25 06:21:30.787152] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2017-05-25 06:21:30.787186] I [socket.c:2403:socket_event_handler]
0-transport: disconnecting now
[2017-05-25 06:21:30.825593] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2017-05-25 06:21:40.067379] I [cli.c:728:main] 0-cli: Started running
gluster with version 3.8.11
[2017-05-25 06:21:40.130303] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2017-05-25 06:21:40.130384] I [socket.c:2403:socket_event_handler]
0-transport: disconnecting now
[2017-05-25 06:21:41.268839] I [input.c:31:cli_batch] 0-: Exiting with: 0

Alessandro

Loading...