Discussion:
[Gluster-users] Stale locks on shards
Samuli Heinonen
2018-01-20 19:57:21 UTC
Permalink
Hi all!

One hypervisor on our virtualization environment crashed and now some of
the VM images cannot be accessed. After investigation we found out that
there was lots of images that still had active lock on crashed
hypervisor. We were able to remove locks from "regular files", but it
doesn't seem possible to remove locks from shards.

We are running GlusterFS 3.8.15 on all nodes.

Here is part of statedump that shows shard having active lock on crashed
node:
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24

If we try to run clear-locks we get following error message:
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted

Gluster vol info if needed:
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
Options Reconfigured:
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO

Any recommendations how to advance from here?

Best regards,
Samuli Heinonen
Samuli Heinonen
2018-01-21 19:03:38 UTC
Permalink
Hi again,

here is more information regarding issue described earlier

It looks like self healing is stuck. According to "heal statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's
around Sun Jan 21 20:30 when writing this). However glustershd.log says
that last heal was completed at "2018-01-20 11:00:13.090697" (which is
13:00 UTC+2). Also "heal info" has been running now for over 16 hours
without any information. In statedump I can see that storage nodes have
locks on files and some of those are blocked. Ie. Here again it says
that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock
was granted.:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52

I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was causing
issues. However now I think that this was unnecessary. After the crash
arbiter logs had lots of messages like this:
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server:
37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
[Operation not permitted]

Is there anyways to force self heal to stop? Any help would be very much
appreciated :)

Best regards,
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now some
of the VM images cannot be accessed. After investigation we found out
that there was lots of images that still had active lock on crashed
hypervisor. We were able to remove locks from "regular files", but it
doesn't seem possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Krutika Dhananjay
2018-01-23 07:19:12 UTC
Permalink
Post by Samuli Heinonen
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics" crawl
began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun
Jan 21 20:30 when writing this). However glustershd.log says that last heal
was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also
"heal info" has been running now for over 16 hours without any information.
In statedump I can see that storage nodes have locks on files and some of
those are blocked. Ie. Here again it says that ovirt8z2 is having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash happened.
We decided to remove it because we thought that it was causing issues.
However now I think that this was unnecessary. After the crash arbiter logs
[2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> (a52055bd-e2e9-42dd-92a3-e96b693bcafe)
==> (Operation not permitted) [Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very much
appreciated :)
The locks are contending in afr self-heal and data path domains. It's
possible that the deadlock is not caused by the hypervisor as if that were
the case, the locks should have been released when it crashed/disconnected.

Adding AFR devs to check what's causing the deadlock in the first place.

-Krutika
Post by Samuli Heinonen
Best regards,
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now some of
the VM images cannot be accessed. After investigation we found out that
there was lots of images that still had active lock on crashed hypervisor.
We were able to remove locks from "regular files", but it doesn't seem
possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on crashed
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Pranith Kumar Karampuri
2018-01-23 07:34:13 UTC
Permalink
Post by Samuli Heinonen
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics" crawl
began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun
Jan 21 20:30 when writing this). However glustershd.log says that last heal
was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also
"heal info" has been running now for over 16 hours without any information.
In statedump I can see that storage nodes have locks on files and some of
those are blocked. Ie. Here again it says that ovirt8z2 is having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash happened.
We decided to remove it because we thought that it was causing issues.
However now I think that this was unnecessary. After the crash arbiter logs
[2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> (a52055bd-e2e9-42dd-92a3-e96b693bcafe)
==> (Operation not permitted) [Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very much
appreciated :)
Exposing .shard to a normal mount is opening a can of worms. You should
probably look at mounting the volume with gfid aux-mount where you can
access a file with <path-to-mount>/.gfid/<gfid-string>to clear locks on it.

Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol

A gfid string will have some hyphens like: 11118443-1894-4273-9340-4b212fa1c0e4

That said. Next disconnect on the brick where you successfully did the
clear-locks will crash the brick. There was a bug in 3.8.x series with
clear-locks which was fixed in 3.9.0 with a feature. The self-heal
deadlocks that you witnessed also is fixed in 3.10 version of the release.

3.8.x is EOLed, so I recommend you to upgrade to a supported version soon.
Post by Samuli Heinonen
Best regards,
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now some of
the VM images cannot be accessed. After investigation we found out that
there was lots of images that still had active lock on crashed hypervisor.
We were able to remove locks from "regular files", but it doesn't seem
possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on crashed
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Pranith Kumar Karampuri
2018-01-23 07:36:38 UTC
Permalink
On Tue, Jan 23, 2018 at 1:04 PM, Pranith Kumar Karampuri <
Post by Pranith Kumar Karampuri
Post by Samuli Heinonen
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics" crawl
began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun
Jan 21 20:30 when writing this). However glustershd.log says that last heal
was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also
"heal info" has been running now for over 16 hours without any information.
In statedump I can see that storage nodes have locks on files and some of
those are blocked. Ie. Here again it says that ovirt8z2 is having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3420, owner=d8b9372c397f0000, client=0x7f8858410be0, connection-id=
ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:9468
25-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash happened.
We decided to remove it because we thought that it was causing issues.
However now I think that this was unnecessary. After the crash arbiter logs
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very much
appreciated :)
Exposing .shard to a normal mount is opening a can of worms. You should
probably look at mounting the volume with gfid aux-mount where you can
access a file with <path-to-mount>/.gfid/<gfid-string>to clear locks on it.
Please use this mount only for doing just this work and unmount it after
that. But my recommendation would be to do an upgrade as soon as possible.
Your bricks will crash on the next disconnect from 'sto2z2.xxx' if you are
not lucky.
Post by Pranith Kumar Karampuri
Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol
A gfid string will have some hyphens like: 11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you successfully did the
clear-locks will crash the brick. There was a bug in 3.8.x series with
clear-locks which was fixed in 3.9.0 with a feature. The self-heal
deadlocks that you witnessed also is fixed in 3.10 version of the release.
3.8.x is EOLed, so I recommend you to upgrade to a supported version soon.
Post by Samuli Heinonen
Best regards,
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now some of
the VM images cannot be accessed. After investigation we found out that
there was lots of images that still had active lock on crashed hypervisor.
We were able to remove locks from "regular files", but it doesn't seem
possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on crashed
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
--
Pranith
Samuli Heinonen
2018-01-23 08:08:47 UTC
Permalink
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Post by Samuli Heinonen
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on
(It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
running now for over 16 hours without any information. In statedump
I can see that storage nodes have locks on files and some of those
are blocked. Ie. Here again it says that ovirt8z2 is having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of worms. You
should probably look at mounting the volume with gfid aux-mount where
you can access a file with <path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you successfully did the
clear-locks will crash the brick. There was a bug in 3.8.x series with
clear-locks which was fixed in 3.9.0 with a feature. The self-heal
deadlocks that you witnessed also is fixed in 3.10 version of the release.
Thank you the answer. Could you please tell more about crash? What will
actually happen or is there a bug report about it? Just want to make
sure that we can do everything to secure data on bricks. We will look
into upgrade but we have to make sure that new version works for us and
of course get self healing working before doing anything :)

Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a supported version soon.
Post by Samuli Heinonen
Best regards,
Samuli Heinonen
Post by Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now
some of the VM images cannot be accessed. After investigation we
found out that there was lots of images that still had active lock
on crashed hypervisor. We were able to remove locks from "regular
files", but it doesn't seem possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
Post by Samuli Heinonen
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [1]
--
Pranith
------
[1] http://lists.gluster.org/mailman/listinfo/gluster-users
Pranith Kumar Karampuri
2018-01-23 08:30:40 UTC
Permalink
Post by Samuli Heinonen
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
Post by Samuli Heinonen
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on
(It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
running now for over 16 hours without any information. In statedump
I can see that storage nodes have locks on files and some of those
are blocked. Ie. Here again it says that ovirt8z2 is having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:9468
25-zone2-ssd1-vmstor1-client-0-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of worms. You
should probably look at mounting the volume with gfid aux-mount where
you can access a file with <path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you successfully did the
clear-locks will crash the brick. There was a bug in 3.8.x series with
clear-locks which was fixed in 3.9.0 with a feature. The self-heal
deadlocks that you witnessed also is fixed in 3.10 version of the release.
Thank you the answer. Could you please tell more about crash? What will
actually happen or is there a bug report about it? Just want to make sure
that we can do everything to secure data on bricks. We will look into
upgrade but we have to make sure that new version works for us and of
course get self healing working before doing anything :)
Locks xlator/module maintains a list of locks that are granted to a client.
Clear locks had an issue where it forgets to remove the lock from this
list. So the connection list ends up pointing to data that is freed in that
list after a clear lock. When a disconnect happens, all the locks that are
granted to a client need to be unlocked. So the process starts traversing
through this list and when it starts trying to access this freed data it
leads to a crash. I found it while reviewing a feature patch sent by
facebook folks to locks xlator (http://review.gluster.org/14816) for 3.9.0
and they also fixed this bug as well as part of that feature patch.
Post by Samuli Heinonen
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a supported version soon.
Best regards,
Post by Samuli Heinonen
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now
some of the VM images cannot be accessed. After investigation we
found out that there was lots of images that still had active lock
on crashed hypervisor. We were able to remove locks from "regular
files", but it doesn't seem possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [1]
--
Pranith
------
[1] http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Samuli Heinonen
2018-01-24 20:57:03 UTC
Permalink
Hi!

Thank you very much for your help so far. Could you please tell an
example command how to use aux-gid-mount to remove locks? "gluster vol
clear-locks" seems to mount volume by itself.

Best regards,
Samuli Heinonen
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on
(It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
running now for over 16 hours without any information. In statedump
I can see that storage nodes have locks on files and some of those
are blocked. Ie. Here again it says that ovirt8z2 is having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was
unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you successfully did the
clear-locks will crash the brick. There was a bug in 3.8.x series with
clear-locks which was fixed in 3.9.0 with a feature. The self-heal
deadlocks that you witnessed also is fixed in 3.10 version of the release.
Thank you the answer. Could you please tell more about crash? What
will actually happen or is there a bug report about it? Just want
to make sure that we can do everything to secure data on bricks.
We will look into upgrade but we have to make sure that new
version works for us and of course get self healing working before
doing anything :)
Locks xlator/module maintains a list of locks that are granted to a
client. Clear locks had an issue where it forgets to remove the lock
from this list. So the connection list ends up pointing to data that
is freed in that list after a clear lock. When a disconnect happens,
all the locks that are granted to a client need to be unlocked. So the
process starts traversing through this list and when it starts trying
to access this freed data it leads to a crash. I found it while
reviewing a feature patch sent by facebook folks to locks xlator
(http://review.gluster.org/14816) for 3.9.0 and they also fixed this
bug as well as part of that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still had
active lock
on crashed hypervisor. We were able to remove locks
from "regular
files", but it doesn't seem possible to remove locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
[1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users> [1]
--
Pranith
------
[1] http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's
around Sun Jan 21 20:30 when writing this). However glustershd.log
says that last heal was completed at "2018-01-20 11:00:13.090697"
(which is 13:00 UTC+2). Also "heal info" has been running now for over
16 hours without any information. In statedump I can see that storage
nodes have locks on files and some of those are blocked. Ie. Here
again it says that ovirt8z2 is having active lock even ovirt8z2
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very
much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now some
of the VM images cannot be accessed. After investigation we found out
that there was lots of images that still had active lock on crashed
hypervisor. We were able to remove locks from "regular files", but it
doesn't seem possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
Pranith Kumar Karampuri
2018-01-25 05:09:16 UTC
Permalink
Hi!
Thank you very much for your help so far. Could you please tell an example
command how to use aux-gid-mount to remove locks? "gluster vol clear-locks"
seems to mount volume by itself.
You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr syscall on
the file.
Could you give me the clear-locks command you were trying to execute and I
can probably convert it to the getfattr command?
Best regards,
Samuli Heinonen
23 January 2018 at 10.30
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on
(It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info"
has been
running now for over 16 hours without any information. In
statedump
I can see that storage nodes have locks on files and some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-
heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metad
ata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was
unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you successfully did the
clear-locks will crash the brick. There was a bug in 3.8.x series with
clear-locks which was fixed in 3.9.0 with a feature. The self-heal
deadlocks that you witnessed also is fixed in 3.10 version of the
release.
Thank you the answer. Could you please tell more about crash? What
will actually happen or is there a bug report about it? Just want
to make sure that we can do everything to secure data on bricks.
We will look into upgrade but we have to make sure that new
version works for us and of course get self healing working before
doing anything :)
Locks xlator/module maintains a list of locks that are granted to a
client. Clear locks had an issue where it forgets to remove the lock from
this list. So the connection list ends up pointing to data that is freed in
that list after a clear lock. When a disconnect happens, all the locks that
are granted to a client need to be unlocked. So the process starts
traversing through this list and when it starts trying to access this freed
data it leads to a crash. I found it while reviewing a feature patch sent
by facebook folks to locks xlator (http://review.gluster.org/14816) for
3.9.0 and they also fixed this bug as well as part of that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still had
active lock
on crashed hypervisor. We were able to remove locks
from "regular
files", but it doesn't seem possible to remove locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users <
http://lists.gluster.org/mailman/listinfo/gluster-users>
[1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users> [1]
--
Pranith
------
[1] http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal statistics" crawl
began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun
Jan 21 20:30 when writing this). However glustershd.log says that last heal
was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also
"heal info" has been running now for over 16 hours without any information.
In statedump I can see that storage nodes have locks on files and some of
those are blocked. Ie. Here again it says that ovirt8z2 is having active
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3420, owner=d8b9372c397f0000, client=0x7f8858410be0, connection-id=
ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:9468
25-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash happened.
We decided to remove it because we thought that it was causing issues.
However now I think that this was unnecessary. After the crash arbiter logs
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help would be very much
appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now some of
the VM images cannot be accessed. After investigation we found out that
there was lots of images that still had active lock on crashed hypervisor.
We were able to remove locks from "regular files", but it doesn't seem
possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on crashed
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Samuli Heinonen
2018-01-25 08:19:30 UTC
Permalink
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
Post by Samuli Heinonen
Hi!
Thank you very much for your help so far. Could you please tell an
example command how to use aux-gid-mount to remove locks? "gluster
vol clear-locks" seems to mount volume by itself.
You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr
syscall on the file.
Could you give me the clear-locks command you were trying to execute
and I can probably convert it to the getfattr command?
I have been testing this in test environment and with command:
gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c
kind all inode
Post by Samuli Heinonen
Best regards,
Samuli Heinonen
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
here is more information regarding issue described
earlier
It looks like self healing is stuck. According to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on
(It's around Sun Jan 21 20:30 when writing this).
However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info"
has been
running now for over 16 hours without any information.
In
statedump
I can see that storage nodes have locks on files and
some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com [1]
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0,
len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick
before
crash
happened. We decided to remove it because we thought
that
it was
causing issues. However now I think that this was
unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of
worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount
vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you
successfully
did the
clear-locks will crash the brick. There was a bug in 3.8.x
series with
clear-locks which was fixed in 3.9.0 with a feature. The
self-heal
deadlocks that you witnessed also is fixed in 3.10 version
of the
release.
Thank you the answer. Could you please tell more about crash? What
will actually happen or is there a bug report about it? Just
want
to make sure that we can do everything to secure data on
bricks.
We will look into upgrade but we have to make sure that new
version works for us and of course get self healing working
before
doing anything :)
Locks xlator/module maintains a list of locks that are granted to
a client. Clear locks had an issue where it forgets to remove the
lock from this list. So the connection list ends up pointing to
data that is freed in that list after a clear lock. When a
disconnect happens, all the locks that are granted to a client
need to be unlocked. So the process starts traversing through this
list and when it starts trying to access this freed data it leads
to a crash. I found it while reviewing a feature patch sent by
facebook folks to locks xlator (http://review.gluster.org/14816
[2]) for 3.9.0 and they also fixed this bug as well as part of
that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a
supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had
active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following
error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
--
Pranith
------
[1]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users
[3]>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal
statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on (It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
running now for over 16 hours without any information. In
statedump I can see that storage nodes have locks on files and
some of those are blocked. Ie. Here again it says that ovirt8z2 is
having active lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com
[1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was unnecessary.
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted) [Operation not permitted]
Is there anyways to force self heal to stop? Any help would be
very much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now
some of the VM images cannot be accessed. After investigation we
found out that there was lots of images that still had active lock
on crashed hypervisor. We were able to remove locks from "regular
files", but it doesn't seem possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
[3] http://lists.gluster.org/mailman/listinfo/gluster-users
Pranith Kumar Karampuri
2018-01-25 08:22:08 UTC
Permalink
Post by Samuli Heinonen
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
Hi!
Post by Samuli Heinonen
Thank you very much for your help so far. Could you please tell an
example command how to use aux-gid-mount to remove locks? "gluster
vol clear-locks" seems to mount volume by itself.
You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr
syscall on the file.
Could you give me the clear-locks command you were trying to execute
and I can probably convert it to the getfattr command?
gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c
kind all inode
Could you do strace of glusterd when this happens? It will have a getxattr
with "glusterfs.clrlk" in the key. You need to execute that on the
gfid-aux-mount
Post by Samuli Heinonen
Best regards,
Post by Samuli Heinonen
Samuli Heinonen
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
here is more information regarding issue described
earlier
It looks like self healing is stuck. According to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on
(It's around Sun Jan 21 20:30 when writing this).
However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info"
has been
running now for over 16 hours without any information.
In
statedump
I can see that storage nodes have locks on files and
some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com [1]
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
zone2-ssd1-vmstor1-client-0-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0,
len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick
before
crash
happened. We decided to remove it because we thought
that
it was
causing issues. However now I think that this was
unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of
worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount
vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you
successfully
did the
clear-locks will crash the brick. There was a bug in 3.8.x
series with
clear-locks which was fixed in 3.9.0 with a feature. The
self-heal
deadlocks that you witnessed also is fixed in 3.10 version
of the
release.
Thank you the answer. Could you please tell more about crash? What
will actually happen or is there a bug report about it? Just want
to make sure that we can do everything to secure data on
bricks.
We will look into upgrade but we have to make sure that new
version works for us and of course get self healing working
before
doing anything :)
Locks xlator/module maintains a list of locks that are granted to
a client. Clear locks had an issue where it forgets to remove the
lock from this list. So the connection list ends up pointing to
data that is freed in that list after a clear lock. When a
disconnect happens, all the locks that are granted to a client
need to be unlocked. So the process starts traversing through this
list and when it starts trying to access this freed data it leads
to a crash. I found it while reviewing a feature patch sent by
facebook folks to locks xlator (http://review.gluster.org/14816
[2]) for 3.9.0 and they also fixed this bug as well as part of
that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a
supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had
active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following
error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
--
Pranith
------
[1]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users
[3]>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal
statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on (It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
running now for over 16 hours without any information. In
statedump I can see that storage nodes have locks on files and
some of those are blocked. Ie. Here again it says that ovirt8z2 is
having active lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com
[1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was unnecessary.
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted) [Operation not permitted]
Is there anyways to force self heal to stop? Any help would be
very much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now
some of the VM images cannot be accessed. After investigation we
found out that there was lots of images that still had active lock
on crashed hypervisor. We were able to remove locks from "regular
files", but it doesn't seem possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
[3] http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Pranith Kumar Karampuri
2018-01-29 00:24:16 UTC
Permalink
Hi,
Did you find the command from strace?
Post by Pranith Kumar Karampuri
Post by Samuli Heinonen
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
Hi!
Post by Samuli Heinonen
Thank you very much for your help so far. Could you please tell an
example command how to use aux-gid-mount to remove locks? "gluster
vol clear-locks" seems to mount volume by itself.
You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr
syscall on the file.
Could you give me the clear-locks command you were trying to execute
and I can probably convert it to the getfattr command?
gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c
kind all inode
Could you do strace of glusterd when this happens? It will have a getxattr
with "glusterfs.clrlk" in the key. You need to execute that on the
gfid-aux-mount
Post by Samuli Heinonen
Best regards,
Post by Samuli Heinonen
Samuli Heinonen
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
here is more information regarding issue described
earlier
It looks like self healing is stuck. According to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on
(It's around Sun Jan 21 20:30 when writing this).
However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info"
has been
running now for over 16 hours without any information.
In
statedump
I can see that storage nodes have locks on files and
some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
ne2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com [1]
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zo
ne2-ssd1-vmstor1-client-0-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0,
len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
ne2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick
before
crash
happened. We decided to remove it because we thought
that
it was
causing issues. However now I think that this was
unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of
worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount
vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you
successfully
did the
clear-locks will crash the brick. There was a bug in 3.8.x
series with
clear-locks which was fixed in 3.9.0 with a feature. The
self-heal
deadlocks that you witnessed also is fixed in 3.10 version
of the
release.
Thank you the answer. Could you please tell more about crash? What
will actually happen or is there a bug report about it? Just want
to make sure that we can do everything to secure data on
bricks.
We will look into upgrade but we have to make sure that new
version works for us and of course get self healing working before
doing anything :)
Locks xlator/module maintains a list of locks that are granted to
a client. Clear locks had an issue where it forgets to remove the
lock from this list. So the connection list ends up pointing to
data that is freed in that list after a clear lock. When a
disconnect happens, all the locks that are granted to a client
need to be unlocked. So the process starts traversing through this
list and when it starts trying to access this freed data it leads
to a crash. I found it while reviewing a feature patch sent by
facebook folks to locks xlator (http://review.gluster.org/14816
[2]) for 3.9.0 and they also fixed this bug as well as part of
that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a
supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had
active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following
error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
--
Pranith
------
[1]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users
[3]>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue described earlier
It looks like self healing is stuck. According to "heal
statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on (It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
running now for over 16 hours without any information. In
statedump I can see that storage nodes have locks on files and
some of those are blocked. Ie. Here again it says that ovirt8z2 is
having active lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
ne2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com
[1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
ne2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was unnecessary.
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted) [Operation not permitted]
Is there anyways to force self heal to stop? Any help would be
very much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment crashed and now
some of the VM images cannot be accessed. After investigation we
found out that there was lots of images that still had active lock
on crashed hypervisor. We were able to remove locks from "regular
files", but it doesn't seem possible to remove locks from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
Post by Samuli Heinonen
granted at 2018-01-20 08:57:24
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
[3] http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Samuli Heinonen
2018-01-29 05:20:11 UTC
Permalink
Hi!

Yes, thank you for asking. I found out this line in the production
environment:
lgetxattr("/tmp/zone2-ssd1-vmstor1.s6jvPu//.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32",
"glusterfs.clrlk.tinode.kblocked", 0x7f2d7c4379f0, 4096) = -1 EPERM
(Operation not permitted)

And this one in test environment (with posix locks):
lgetxattr("/tmp/g1.gHj4Bw//file38", "glusterfs.clrlk.tposix.kblocked",
"box1:/gluster/1/export/: posix blocked locks=1 granted locks=0", 4096) = 77

In test environment I tried running following command which seemed to
release gluster locks:

getfattr -n glusterfs.clrlk.tposix.kblocked file38

So I think it would go like this in production environment with locks on
shards (using aux-gfid-mount mount option):
getfattr -n glusterfs.clrlk.tinode.kall
.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32

I haven't been able to try this out in production environment yet.

Is there perhaps something else to notice?

Would you be able to tell more about bricks crashing after releasing
locks? Under what circumstances that does happen? Is it only process
exporting the brick crashes or is there a possibility of data corruption?

Best regards,
Samuli Heinonen
Post by Pranith Kumar Karampuri
Hi,
Did you find the command from strace?
On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
Hi!
Thank you very much for your help so far. Could you
please tell an
example command how to use aux-gid-mount to remove
locks? "gluster
vol clear-locks" seems to mount volume by itself.
You are correct, sorry, this was implemented around 7 years
back and I
forgot that bit about it :-(. Essentially it becomes a getxattr
syscall on the file.
Could you give me the clear-locks command you were trying to
execute
and I can probably convert it to the getfattr command?
gluster vol clear-locks g1
/.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode
Could you do strace of glusterd when this happens? It will have a
getxattr with "glusterfs.clrlk" in the key. You need to execute that
on the gfid-aux-mount
Best regards,
Samuli Heinonen
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
here is more information regarding issue described
earlier
It looks like self healing is stuck. According to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on
(It's around Sun Jan 21 20:30 when writing this).
However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info"
has been
running now for over 16 hours without any information.
In
statedump
I can see that storage nodes have locks on files and
some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com
<http://ovirt8z2.xxx.com> [1]
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0,
len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick
before
crash
happened. We decided to remove it because we thought
that
it was
causing issues. However now I think that this was
unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of
worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount
vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you
successfully
did the
clear-locks will crash the brick. There was a bug in
3.8.x
series with
clear-locks which was fixed in 3.9.0 with a feature. The
self-heal
deadlocks that you witnessed also is fixed in 3.10
version
of the
release.
Thank you the answer. Could you please tell more
about crash?
What
will actually happen or is there a bug report about
it? Just
want
to make sure that we can do everything to secure data on
bricks.
We will look into upgrade but we have to make sure
that new
version works for us and of course get self healing
working
before
doing anything :)
Locks xlator/module maintains a list of locks that
are granted to
a client. Clear locks had an issue where it forgets
to remove the
lock from this list. So the connection list ends up
pointing to
data that is freed in that list after a clear lock.
When a
disconnect happens, all the locks that are granted
to a client
need to be unlocked. So the process starts
traversing through this
list and when it starts trying to access this freed
data it leads
to a crash. I found it while reviewing a feature
patch sent by
facebook folks to locks xlator
(http://review.gluster.org/14816
<http://review.gluster.org/14816>
[2]) for 3.9.0 and they also fixed this bug as well
as part of
that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a
supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had
active lock
on crashed hypervisor. We were able to remove locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]>
[1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]> [1]
--
Pranith
------
[1]
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue described
earlier
It looks like self healing is stuck. According to "heal
statistics" crawl began at Sat Jan 20 12:56:19 2018
and it's still
going on (It's around Sun Jan 21 20:30 when writing
this). However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info" has been
running now for over 16 hours without any
information. In
statedump I can see that storage nodes have locks on
files and
some of those are blocked. Ie. Here again it says
that ovirt8z2 is
having active lock even ovirt8z2 crashed after the
lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3420, owner=d8b9372c397f0000,
client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com <http://ovirt8z2.xxx.com>
[1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick
before crash
happened. We decided to remove it because we thought
that it was
causing issues. However now I think that this was
unnecessary.
After the crash arbiter logs had lots of messages
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==>
(Operation not
permitted) [Operation not permitted]
Is there anyways to force self heal to stop? Any
help would be
very much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had active lock
on crashed hypervisor. We were able to remove locks
from "regular
files", but it doesn't seem possible to remove locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
[3]
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
<http://review.gluster.org/14816>
[3] http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
--
Pranith
Pranith Kumar Karampuri
2018-01-29 05:32:11 UTC
Permalink
On 29 Jan 2018 10:50 am, "Samuli Heinonen" <***@neutraali.net> wrote:

Hi!

Yes, thank you for asking. I found out this line in the production
environment:
lgetxattr("/tmp/zone2-ssd1-vmstor1.s6jvPu//.shard/f349ffbd-
a423-4fb2-b83c-2d1d5e78e1fb.32", "glusterfs.clrlk.tinode.kblocked",
0x7f2d7c4379f0, 4096) = -1 EPERM (Operation not permitted)


I was expecting .kall instead of .blocked,
did you change the cli to kind blocked?


And this one in test environment (with posix locks):
lgetxattr("/tmp/g1.gHj4Bw//file38", "glusterfs.clrlk.tposix.kblocked",
"box1:/gluster/1/export/: posix blocked locks=1 granted locks=0", 4096) = 77

In test environment I tried running following command which seemed to
release gluster locks:

getfattr -n glusterfs.clrlk.tposix.kblocked file38

So I think it would go like this in production environment with locks on
shards (using aux-gfid-mount mount option):
getfattr -n glusterfs.clrlk.tinode.kall .shard/f349ffbd-a423-4fb2-b83c
-2d1d5e78e1fb.32

I haven't been able to try this out in production environment yet.

Is there perhaps something else to notice?

Would you be able to tell more about bricks crashing after releasing locks?
Under what circumstances that does happen? Is it only process exporting the
brick crashes or is there a possibility of data corruption?


No data corruption. Brick process where you did clear-locks may crash.


Best regards,
Samuli Heinonen
Post by Pranith Kumar Karampuri
Hi,
Did you find the command from strace?
On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
Hi!
Thank you very much for your help so far. Could you
please tell an
example command how to use aux-gid-mount to remove
locks? "gluster
vol clear-locks" seems to mount volume by itself.
You are correct, sorry, this was implemented around 7 years
back and I
forgot that bit about it :-(. Essentially it becomes a getxattr
syscall on the file.
Could you give me the clear-locks command you were trying to
execute
and I can probably convert it to the getfattr command?
gluster vol clear-locks g1
/.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode
Could you do strace of glusterd when this happens? It will have a
getxattr with "glusterfs.clrlk" in the key. You need to execute that
on the gfid-aux-mount
Best regards,
Samuli Heinonen
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
Hi again,
here is more information regarding issue described
earlier
It looks like self healing is stuck. According to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on
(It's around Sun Jan 21 20:30 when writing this).
However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info"
has been
running now for over 16 hours without any information.
In
statedump
I can see that storage nodes have locks on files and
some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com
<http://ovirt8z2.xxx.com> [1]
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0,
len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick
before
crash
happened. We decided to remove it because we thought
that
it was
causing issues. However now I think that this was
unnecessary. After
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a can of
worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o aux-gfid-mount
vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you
successfully
did the
clear-locks will crash the brick. There was a bug in
3.8.x
series with
clear-locks which was fixed in 3.9.0 with a feature. The
self-heal
deadlocks that you witnessed also is fixed in 3.10
version
of the
release.
Thank you the answer. Could you please tell more
about crash?
What
will actually happen or is there a bug report about
it? Just
want
to make sure that we can do everything to secure data on
bricks.
We will look into upgrade but we have to make sure
that new
version works for us and of course get self healing
working
before
doing anything :)
Locks xlator/module maintains a list of locks that
are granted to
a client. Clear locks had an issue where it forgets
to remove the
lock from this list. So the connection list ends up
pointing to
data that is freed in that list after a clear lock.
When a
disconnect happens, all the locks that are granted
to a client
need to be unlocked. So the process starts
traversing through this
list and when it starts trying to access this freed
data it leads
to a crash. I found it while reviewing a feature
patch sent by
facebook folks to locks xlator
(http://review.gluster.org/14816
<http://review.gluster.org/14816>
[2]) for 3.9.0 and they also fixed this bug as well
as part of
that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade to a
supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had
active lock
on crashed hypervisor. We were able to remove locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailm
an/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]
<http://lists.gluster.org/mail
man/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]>
[1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailm
an/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]
<http://lists.gluster.org/mail
man/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]> [1]
--
Pranith
------
[1]
http://lists.gluster.org/mailm
an/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]
<http://lists.gluster.org/mail
man/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue described
earlier
It looks like self healing is stuck. According to "heal
statistics" crawl began at Sat Jan 20 12:56:19 2018
and it's still
going on (It's around Sun Jan 21 20:30 when writing
this). However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info" has been
running now for over 16 hours without any
information. In
statedump I can see that storage nodes have locks on
files and
some of those are blocked. Ie. Here again it says
that ovirt8z2 is
having active lock even ovirt8z2 crashed after the
lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3420, owner=d8b9372c397f0000,
client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com <
http://ovirt8z2.xxx.com>
[1]-5652-2017/12/27-09:49:02:9
46825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter brick
before crash
happened. We decided to remove it because we thought
that it was
causing issues. However now I think that this was
unnecessary.
After the crash arbiter logs had lots of messages
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==>
(Operation not
permitted) [Operation not permitted]
Is there anyways to force self heal to stop? Any
help would be
very much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailm
an/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had active lock
on crashed hypervisor. We were able to remove locks
from "regular
files", but it doesn't seem possible to remove locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-
ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following error
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailm
an/listinfo/gluster-users
<http://lists.gluster.org/mail
man/listinfo/gluster-users>
[3]
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
<http://review.gluster.org/14816>
[3] http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
--
Pranith
Samuli Heinonen
2018-01-29 07:56:01 UTC
Permalink
Post by Samuli Heinonen
Post by Samuli Heinonen
Hi!
Yes, thank you for asking. I found out this line in the production
lgetxattr("/tmp/zone2-ssd1-vmstor1.s6jvPu//.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32",
Post by Samuli Heinonen
"glusterfs.clrlk.tinode.kblocked", 0x7f2d7c4379f0, 4096) = -1 EPERM
(Operation not permitted)
I was expecting .kall instead of .blocked,
did you change the cli to kind blocked?
Yes, I was testing this with different commands. Basicly it seems that
name of the attribute is
glusterfs.clrlk.t{posix,inode,entry}.k{all,blocked,granted}, am I
correct? Is it necessary to set any value or just reguest the attribute
with getfattr?
Post by Samuli Heinonen
Post by Samuli Heinonen
lgetxattr("/tmp/g1.gHj4Bw//file38",
"glusterfs.clrlk.tposix.kblocked", "box1:/gluster/1/export/: posix
blocked locks=1 granted locks=0", 4096) = 77
In test environment I tried running following command which seemed
getfattr -n glusterfs.clrlk.tposix.kblocked file38
So I think it would go like this in production environment with
getfattr -n glusterfs.clrlk.tinode.kall
.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32
I haven't been able to try this out in production environment yet.
Is there perhaps something else to notice?
Would you be able to tell more about bricks crashing after releasing
locks? Under what circumstances that does happen? Is it only process
exporting the brick crashes or is there a possibility of data
corruption?
No data corruption. Brick process where you did clear-locks may crash.
Post by Samuli Heinonen
Best regards,
Samuli Heinonen
Post by Pranith Kumar Karampuri
Hi,
Did you find the command from strace?
On 25 Jan 2018 1:52 pm, "Pranith Kumar Karampuri"
On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
Hi!
Thank you very much for your help so far. Could
you
please tell an
example command how to use aux-gid-mount to remove
locks? "gluster
vol clear-locks" seems to mount volume by itself.
You are correct, sorry, this was implemented around 7
years
back and I
forgot that bit about it :-(. Essentially it becomes a
getxattr
syscall on the file.
Could you give me the clear-locks command you were
trying to
execute
and I can probably convert it to the getfattr command?
I have been testing this in test environment and with
gluster vol clear-locks g1
/.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode
Could you do strace of glusterd when this happens? It will
have a
getxattr with "glusterfs.clrlk" in the key. You need to
execute that
on the gfid-aux-mount
Best regards,
Samuli Heinonen
Pranith Kumar Karampuri
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli
Heinonen
Pranith Kumar Karampuri kirjoitti 23.01.2018
On Mon, Jan 22, 2018 at 12:33 AM, Samuli
Heinonen
Hi again,
here is more information regarding issue
described
earlier
It looks like self healing is stuck. According
to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and
it's still
going on
(It's around Sun Jan 21 20:30 when writing
this).
However
glustershd.log says that last heal was
completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also
"heal
info"
has been
running now for over 16 hours without any
information.
In
statedump
I can see that storage nodes have locks on
files and
some
of those
are blocked. Ie. Here again it says that
ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0,
len=0, pid
= 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000,
client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com [1]
<http://ovirt8z2.xxx.com> [1]
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE,
whence=0,
start=0,
len=0,
pid = 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter
brick
before
crash
happened. We decided to remove it because we
thought
that
it was
causing issues. However now I think that this
was
unnecessary. After
the crash arbiter logs had lots of messages
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==>
(Operation
not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop?
Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a
can of
worms. You
should probably look at mounting the volume
with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o
aux-gfid-mount
vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where
you
successfully
did the
clear-locks will crash the brick. There was a
bug in
3.8.x
series with
clear-locks which was fixed in 3.9.0 with a
feature. The
self-heal
deadlocks that you witnessed also is fixed in
3.10
version
of the
release.
Thank you the answer. Could you please tell
more
about crash?
What
will actually happen or is there a bug report
about
it? Just
want
to make sure that we can do everything to
secure data on
bricks.
We will look into upgrade but we have to make
sure
that new
version works for us and of course get self
healing
working
before
doing anything :)
Locks xlator/module maintains a list of locks
that
are granted to
a client. Clear locks had an issue where it
forgets
to remove the
lock from this list. So the connection list
ends up
pointing to
data that is freed in that list after a clear
lock.
When a
disconnect happens, all the locks that are
granted
to a client
need to be unlocked. So the process starts
traversing through this
list and when it starts trying to access this
freed
data it leads
to a crash. I found it while reviewing a
feature
patch sent by
facebook folks to locks xlator
(http://review.gluster.org/14816 [2]
<http://review.gluster.org/14816 [2]>
[2]) for 3.9.0 and they also fixed this bug as
well
as part of
that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade
to a
supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization
environment
crashed and now
some of the VM images cannot be accessed.
After
investigation we
found out that there was lots of images that
still
had
active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard
having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following
error
# gluster volume clear-locks
zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]>
[1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]> [1]
--
Pranith
------
[1]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue
described
earlier
It looks like self healing is stuck. According
to "heal
statistics" crawl began at Sat Jan 20 12:56:19
2018
and it's still
going on (It's around Sun Jan 21 20:30 when
writing
this). However
glustershd.log says that last heal was
completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also
"heal
info" has been
running now for over 16 hours without any
information. In
statedump I can see that storage nodes have
locks on
files and
some of those are blocked. Ie. Here again it
says
that ovirt8z2 is
having active lock even ovirt8z2 crashed after
the
lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 3420, owner=d8b9372c397f0000,
client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com [1]
<http://ovirt8z2.xxx.com>
[1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
Post by Pranith Kumar Karampuri
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE,
whence=0,
start=0, len=0,
pid = 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter
brick
before crash
happened. We decided to remove it because we
thought
that it was
causing issues. However now I think that this
was
unnecessary.
After the crash arbiter logs had lots of
messages
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==>
(Operation not
permitted) [Operation not permitted]
Is there anyways to force self heal to stop?
Any
help would be
very much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization
environment
crashed and now
some of the VM images cannot be accessed.
After
investigation we
found out that there was lots of images that
still
had active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard
having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following
error
# gluster volume clear-locks
zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816 [2]
<http://review.gluster.org/14816 [2]>
[3]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
[3] http://lists.gluster.org/mailman/listinfo/gluster-users
Pranith Kumar Karampuri
2018-01-29 08:20:11 UTC
Permalink
Post by Samuli Heinonen
Post by Samuli Heinonen
Hi!
Post by Samuli Heinonen
Yes, thank you for asking. I found out this line in the production
lgetxattr("/tmp/zone2-ssd1-vmstor1.s6jvPu//.shard/f349ffbd-
a423-4fb2-b83c-2d1d5e78e1fb.32",
Post by Samuli Heinonen
"glusterfs.clrlk.tinode.kblocked", 0x7f2d7c4379f0, 4096) = -1 EPERM
(Operation not permitted)
I was expecting .kall instead of .blocked,
did you change the cli to kind blocked?
Yes, I was testing this with different commands. Basicly it seems that
name of the attribute is glusterfs.clrlk.t{posix,inode,entry}.k{all,blocked,granted},
am I correct?
That is correct

Is it necessary to set any value or just reguest the attribute with
Post by Samuli Heinonen
getfattr?
Nope. No I/O is going on the file right? Just request the attribute with
getfattr in that case.
Post by Samuli Heinonen
Post by Samuli Heinonen
Post by Samuli Heinonen
lgetxattr("/tmp/g1.gHj4Bw//file38",
"glusterfs.clrlk.tposix.kblocked", "box1:/gluster/1/export/: posix
blocked locks=1 granted locks=0", 4096) = 77
In test environment I tried running following command which seemed
getfattr -n glusterfs.clrlk.tposix.kblocked file38
So I think it would go like this in production environment with
getfattr -n glusterfs.clrlk.tinode.kall
.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32
I haven't been able to try this out in production environment yet.
Is there perhaps something else to notice?
Would you be able to tell more about bricks crashing after releasing
locks? Under what circumstances that does happen? Is it only process
exporting the brick crashes or is there a possibility of data corruption?
No data corruption. Brick process where you did clear-locks may crash.
Best regards,
Post by Samuli Heinonen
Samuli Heinonen
Hi,
Post by Pranith Kumar Karampuri
Did you find the command from strace?
On 25 Jan 2018 1:52 pm, "Pranith Kumar Karampuri"
On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
Hi!
Thank you very much for your help so far. Could
you
please tell an
example command how to use aux-gid-mount to remove
locks? "gluster
vol clear-locks" seems to mount volume by itself.
You are correct, sorry, this was implemented around 7
years
back and I
forgot that bit about it :-(. Essentially it becomes a
getxattr
syscall on the file.
Could you give me the clear-locks command you were
trying to
execute
and I can probably convert it to the getfattr command?
I have been testing this in test environment and with
gluster vol clear-locks g1
/.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode
Could you do strace of glusterd when this happens? It will
have a
getxattr with "glusterfs.clrlk" in the key. You need to
execute that
on the gfid-aux-mount
Best regards,
Samuli Heinonen
Pranith Kumar Karampuri
23 January 2018 at 10.30
On Tue, Jan 23, 2018 at 1:38 PM, Samuli
Heinonen
Pranith Kumar Karampuri kirjoitti 23.01.2018
On Mon, Jan 22, 2018 at 12:33 AM, Samuli
Heinonen
Hi again,
here is more information regarding issue
described
earlier
It looks like self healing is stuck. According
to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and
it's still
going on
(It's around Sun Jan 21 20:30 when writing
this).
However
glustershd.log says that last heal was
completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also
"heal
info"
has been
running now for over 16 hours without any
information.
In
statedump
I can see that storage nodes have locks on
files and
some
of those
are blocked. Ie. Here again it says that
ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0,
len=0, pid
= 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000,
client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com [1]
<http://ovirt8z2.xxx.com> [1]
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
zone2-ssd1-vmstor1-client-0-7-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE,
whence=0,
start=0,
len=0,
pid = 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter
brick
before
crash
happened. We decided to remove it because we
thought
that
it was
causing issues. However now I think that this
was
unnecessary. After
the crash arbiter logs had lots of messages
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==>
(Operation
not
permitted)
[Operation not permitted]
Is there anyways to force self heal to stop?
Any help
would be very
much appreciated :)
Exposing .shard to a normal mount is opening a
can of
worms. You
should probably look at mounting the volume
with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.
Mount command: mount -t glusterfs -o
aux-gfid-mount
vm1:test
/mnt/testvol
11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where
you
successfully
did the
clear-locks will crash the brick. There was a
bug in
3.8.x
series with
clear-locks which was fixed in 3.9.0 with a
feature. The
self-heal
deadlocks that you witnessed also is fixed in
3.10
version
of the
release.
Thank you the answer. Could you please tell
more
about crash?
What
will actually happen or is there a bug report
about
it? Just
want
to make sure that we can do everything to
secure data on
bricks.
We will look into upgrade but we have to make
sure
that new
version works for us and of course get self
healing
working
before
doing anything :)
Locks xlator/module maintains a list of locks
that
are granted to
a client. Clear locks had an issue where it
forgets
to remove the
lock from this list. So the connection list
ends up
pointing to
data that is freed in that list after a clear
lock.
When a
disconnect happens, all the locks that are
granted
to a client
need to be unlocked. So the process starts
traversing through this
list and when it starts trying to access this
freed
data it leads
to a crash. I found it while reviewing a
feature
patch sent by
facebook folks to locks xlator
(http://review.gluster.org/14816 [2]
<http://review.gluster.org/14816 [2]>
[2]) for 3.9.0 and they also fixed this bug as
well
as part of
that feature patch.
Br,
Samuli
3.8.x is EOLed, so I recommend you to upgrade
to a
supported
version
soon.
Best regards,
Samuli Heinonen
Samuli Heinonen
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization
environment
crashed and now
some of the VM images cannot be accessed.
After
investigation we
found out that there was lots of images that
still
had
active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard
having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following
error
# gluster volume clear-locks
zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]>
[1]
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]> [1]
--
Pranith
------
[1]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]>
--
Pranith
21 January 2018 at 21.03
Hi again,
here is more information regarding issue
described
earlier
It looks like self healing is stuck. According
to "heal
statistics" crawl began at Sat Jan 20 12:56:19
2018
and it's still
going on (It's around Sun Jan 21 20:30 when
writing
this). However
glustershd.log says that last heal was
completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also
"heal
info" has been
running now for over 16 hours without any
information. In
statedump I can see that storage nodes have
locks on
files and
some of those are blocked. Ie. Here again it
says
that ovirt8z2 is
having active lock even ovirt8z2 crashed after
the
lock was
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 3420, owner=d8b9372c397f0000,
client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com [1]
<http://ovirt8z2.xxx.com>
[1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE,
whence=0,
start=0, len=0,
pid = 18446744073709551610,
owner=d0c6d857a87f0000,
client=0x7f885845efa0,
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
zone2-ssd1-vmstor1-client-0-0-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
blocked at 2018-01-20 10:59:52
I'd also like to add that volume had arbiter
brick
before crash
happened. We decided to remove it because we
thought
that it was
causing issues. However now I think that this
was
unnecessary.
After the crash arbiter logs had lots of
messages
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==>
(Operation not
permitted) [Operation not permitted]
Is there anyways to force self heal to stop?
Any
help would be
very much appreciated :)
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
20 January 2018 at 21.57
Hi all!
One hypervisor on our virtualization
environment
crashed and now
some of the VM images cannot be accessed.
After
investigation we
found out that there was lots of images that
still
had active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.
We are running GlusterFS 3.8.15 on all nodes.
Here is part of statedump that shows shard
having
active lock on
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
or1-client-1-7-0,
Post by Samuli Heinonen
Post by Pranith Kumar Karampuri
granted at 2018-01-20 08:57:24
If we try to run clear-locks we get following
error
# gluster volume clear-locks
zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
all inode
Volume clear-locks unsuccessful
Operation not
permitted
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO
Any recommendations how to advance from here?
Best regards,
Samuli Heinonen
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[3]
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816 [2]
<http://review.gluster.org/14816 [2]>
[3]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
--
Pranith
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
[3] http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
Loading...