Discussion:
[Gluster-users] Issues in AFR and self healing
Pablo Schandin
2018-08-10 17:55:54 UTC
Permalink
Hello everyone!

I'm having some trouble with something but I'm not quite sure of with
what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two
servers (nodes) in the cluster in a replica mode. Each server has 2
bricks. As the servers are KVM running several VMs, one brick has some
VMs locally defined in it and the second brick is the replicated from
the other server. It has data but not actual writing is being done
except for the replication.

                            Server 1                                
        Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write)    ---->              
Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files        <-----        
     Brick 2 defined VMs (read/write)

So, the main issue arose when I got a nagios alarm that warned about a
file listed to be healed. And then it disappeared. I came to find out
that every 5 minutes, the self heal daemon triggers the healing and this
fixes it. But looking at the logs I have a lot of entries in the
glustershd.log file like this:

[2018-08-09 14:23:37.689403] I [MSGID: 108026]
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0:
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c.
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026]
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0:
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f.
sources=[0]  sinks=1

The qcow2 files are being healed several times a day (up to 30 in
occasions). As I understand, this means that a data heal occurred on
file with gfid 407b... and 7371... in source to sink. Local server to
replica server? Is it OK for the shd to heal files in the replicated
brick that supposedly has no writing on it besides the mirroring? How
does that work?

How does afr replication work? The file with gfid 7371... is the qcow2
root disk of an owncloud server with 17GB of data. It does not seem to
be that big to be a bottleneck of some sort, I think.

Also, I was investigating the directory tree in brick/.glusterfs/indices
and I notices that both in xattrop and dirty I always have a file
created named xattrop-xxxxxx and dirty-xxxxxx. I read that the xattrop
file is like a parent file or handle to reference other files created
there as hardlinks with gfid name for the shd to heal. Is the same case
as the ones in the dirty dir?

Any help will be greatly appreciated it. Thanks!

Pablo.
Ravishankar N
2018-08-11 03:19:16 UTC
Permalink
Post by Pablo Schandin
Hello everyone!
I'm having some trouble with something but I'm not quite sure of with
what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two
servers (nodes) in the cluster in a replica mode. Each server has 2
bricks. As the servers are KVM running several VMs, one brick has some
VMs locally defined in it and the second brick is the replicated from
the other server. It has data but not actual writing is being done
except for the replication.
                            Server 1                                
        Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write)    ---->            
  Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-----                
Brick 2 defined VMs (read/write)
So, the main issue arose when I got a nagios alarm that warned about a
file listed to be healed. And then it disappeared. I came to find out
that every 5 minutes, the self heal daemon triggers the healing and
this fixes it. But looking at the logs I have a lot of entries in the
[2018-08-09 14:23:37.689403] I [MSGID: 108026]
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c.
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026]
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f.
sources=[0]  sinks=1
The qcow2 files are being healed several times a day (up to 30 in
occasions). As I understand, this means that a data heal occurred on
file with gfid 407b... and 7371... in source to sink. Local server to
replica server? Is it OK for the shd to heal files in the replicated
brick that supposedly has no writing on it besides the mirroring? How
does that work?
In AFR, for writes, there is no notion of local/remote brick. No matter
from which client you write to the volume, it gets sent to both bricks.
i.e. the replication is synchronous and real time.
Post by Pablo Schandin
How does afr replication work? The file with gfid 7371... is the qcow2
root disk of an owncloud server with 17GB of data. It does not seem to
be that big to be a bottleneck of some sort, I think.
Also, I was investigating the directory tree in
brick/.glusterfs/indices and I notices that both in xattrop and dirty
I always have a file created named xattrop-xxxxxx and dirty-xxxxxx. I
read that the xattrop file is like a parent file or handle to
reference other files created there as hardlinks with gfid name for
the shd to heal. Is the same case as the ones in the dirty dir?
Yes, before the write, the gfid gets captured inside dirty on all
bricks. If the write is successful, it gets removed. In addition, if the
write fails on one brick, the other brick will capture the gfid inside
xattrop.
Post by Pablo Schandin
Any help will be greatly appreciated it. Thanks!
If frequent heals are triggered, it could mean there are frequent
network disconnects from the clients to the bricks as writes happen. You
can check the mount logs and see if that is the case and investigate
possible network issues.

HTH,
Ravi
Post by Pablo Schandin
Pablo.
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Pablo Schandin
2018-08-14 12:15:53 UTC
Permalink
Thanks for the info!

I cannot see any logs in the mount log besides one line every time it
rotates

[2018-08-13 06:25:02.246187] I [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk]
0-glusterfs: No change in volfile,continuing

But I did find in the glfsheal-gv1.log of the volumes some kind of
server-client connection that was disconnected and now it connects using
a different port. The block of log per each run is kind of long so I'm
copying it into a pastebin.

https://pastebin.com/bp06rrsT

Maybe this has something to do with it?

Thanks!

Pablo.
Post by Ravishankar N
Post by Pablo Schandin
Hello everyone!
I'm having some trouble with something but I'm not quite sure of with
what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two
servers (nodes) in the cluster in a replica mode. Each server has 2
bricks. As the servers are KVM running several VMs, one brick has
some VMs locally defined in it and the second brick is the replicated
from the other server. It has data but not actual writing is being
done except for the replication.
                            Server 1                              
              Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) ---->               
  Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-----                
Brick 2 defined VMs (read/write)
So, the main issue arose when I got a nagios alarm that warned about
a file listed to be healed. And then it disappeared. I came to find
out that every 5 minutes, the self heal daemon triggers the healing
and this fixes it. But looking at the logs I have a lot of entries in
[2018-08-09 14:23:37.689403] I [MSGID: 108026]
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c.
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026]
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f.
sources=[0]  sinks=1
The qcow2 files are being healed several times a day (up to 30 in
occasions). As I understand, this means that a data heal occurred on
file with gfid 407b... and 7371... in source to sink. Local server to
replica server? Is it OK for the shd to heal files in the replicated
brick that supposedly has no writing on it besides the mirroring? How
does that work?
In AFR, for writes, there is no notion of local/remote brick. No
matter from which client you write to the volume, it gets sent to both
bricks. i.e. the replication is synchronous and real time.
Post by Pablo Schandin
How does afr replication work? The file with gfid 7371... is the
qcow2 root disk of an owncloud server with 17GB of data. It does not
seem to be that big to be a bottleneck of some sort, I think.
Also, I was investigating the directory tree in
brick/.glusterfs/indices and I notices that both in xattrop and dirty
I always have a file created named xattrop-xxxxxx and dirty-xxxxxx. I
read that the xattrop file is like a parent file or handle to
reference other files created there as hardlinks with gfid name for
the shd to heal. Is the same case as the ones in the dirty dir?
Yes, before the write, the gfid gets captured inside dirty on all
bricks. If the write is successful, it gets removed. In addition, if
the write fails on one brick, the other brick will capture the gfid
inside xattrop.
Post by Pablo Schandin
Any help will be greatly appreciated it. Thanks!
If frequent heals are triggered, it could mean there are frequent
network disconnects from the clients to the bricks as writes happen.
You can check the mount logs and see if that is the case and
investigate possible network issues.
HTH,
Ravi
Post by Pablo Schandin
Pablo.
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Pablo Schandin
2018-08-15 17:37:43 UTC
Permalink
I found another log that I wasn't aware of in /var/log/glusterfs/brick,
that is te mount log, I confused the log files. In this file I see a lot
of entries like this one:

[2018-08-15 16:41:19.568477] I [addr.c:55:compare_addr_and_update]
0-/mnt/brick1/gv1: allowed = "172.20.36.10", received addr = "172.20.36.11"
[2018-08-15 16:41:19.568527] I [addr.c:55:compare_addr_and_update]
0-/mnt/brick1/gv1: allowed = "172.20.36.11", received addr = "172.20.36.11"
[2018-08-15 16:41:19.568547] I [login.c:76:gf_auth] 0-auth/login:
allowed user names: 7107ccfa-0ba1-4172-aa5a-031568927bf1
[2018-08-15 16:41:19.568564] I [MSGID: 115029]
[server-handshake.c:793:server_setvolume] 0-gv1-server: accepted client
from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
(version: 3.1
2.6)
[2018-08-15 16:41:19.582710] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-gv1-server: disconnecting connection
from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
[2018-08-15 16:41:19.582830] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down connection
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0

So I see a lot of disconnections, right? This might be why the self
healing is triggered all the time?

Thanks!

Pablo.

Avature

Get Engaged to Talent
Post by Pablo Schandin
Thanks for the info!
I cannot see any logs in the mount log besides one line every time it
rotates
[2018-08-13 06:25:02.246187] I
[glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in
volfile,continuing
But I did find in the glfsheal-gv1.log of the volumes some kind of
server-client connection that was disconnected and now it connects
using a different port. The block of log per each run is kind of long
so I'm copying it into a pastebin.
https://pastebin.com/bp06rrsT
Maybe this has something to do with it?
Thanks!
Pablo.
Post by Ravishankar N
Post by Pablo Schandin
Hello everyone!
I'm having some trouble with something but I'm not quite sure of
with what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have
two servers (nodes) in the cluster in a replica mode. Each server
has 2 bricks. As the servers are KVM running several VMs, one brick
has some VMs locally defined in it and the second brick is the
replicated from the other server. It has data but not actual writing
is being done except for the replication.
                            Server 1                              
              Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) ---->           
      Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-----           
     Brick 2 defined VMs (read/write)
So, the main issue arose when I got a nagios alarm that warned about
a file listed to be healed. And then it disappeared. I came to find
out that every 5 minutes, the self heal daemon triggers the healing
and this fixes it. But looking at the logs I have a lot of entries
[2018-08-09 14:23:37.689403] I [MSGID: 108026]
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c.
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026]
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f.
sources=[0]  sinks=1
The qcow2 files are being healed several times a day (up to 30 in
occasions). As I understand, this means that a data heal occurred on
file with gfid 407b... and 7371... in source to sink. Local server
to replica server? Is it OK for the shd to heal files in the
replicated brick that supposedly has no writing on it besides the
mirroring? How does that work?
In AFR, for writes, there is no notion of local/remote brick. No
matter from which client you write to the volume, it gets sent to
both bricks. i.e. the replication is synchronous and real time.
Post by Pablo Schandin
How does afr replication work? The file with gfid 7371... is the
qcow2 root disk of an owncloud server with 17GB of data. It does not
seem to be that big to be a bottleneck of some sort, I think.
Also, I was investigating the directory tree in
brick/.glusterfs/indices and I notices that both in xattrop and
dirty I always have a file created named xattrop-xxxxxx and
dirty-xxxxxx. I read that the xattrop file is like a parent file or
handle to reference other files created there as hardlinks with gfid
name for the shd to heal. Is the same case as the ones in the dirty dir?
Yes, before the write, the gfid gets captured inside dirty on all
bricks. If the write is successful, it gets removed. In addition, if
the write fails on one brick, the other brick will capture the gfid
inside xattrop.
Post by Pablo Schandin
Any help will be greatly appreciated it. Thanks!
If frequent heals are triggered, it could mean there are frequent
network disconnects from the clients to the bricks as writes happen.
You can check the mount logs and see if that is the case and
investigate possible network issues.
HTH,
Ravi
Post by Pablo Schandin
Pablo.
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-08-16 04:06:58 UTC
Permalink
Post by Pablo Schandin
I found another log that I wasn't aware of in
/var/log/glusterfs/brick, that is te mount log, I confused the log
[2018-08-15 16:41:19.568477] I [addr.c:55:compare_addr_and_update]
0-/mnt/brick1/gv1: allowed = "172.20.36.10", received addr =
"172.20.36.11"
[2018-08-15 16:41:19.568527] I [addr.c:55:compare_addr_and_update]
0-/mnt/brick1/gv1: allowed = "172.20.36.11", received addr =
"172.20.36.11"
allowed user names: 7107ccfa-0ba1-4172-aa5a-031568927bf1
[2018-08-15 16:41:19.568564] I [MSGID: 115029]
[server-handshake.c:793:server_setvolume] 0-gv1-server: accepted
client from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
(version: 3.1
2.6)
[2018-08-15 16:41:19.582710] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-gv1-server: disconnecting
connection from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
[2018-08-15 16:41:19.582830] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down
connection
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
So I see a lot of disconnections, right? This might be why the self
healing is triggered all the time?
Not necessarily. These disconnects could also be due to the glfsheal
binary which is invoked when you run `gluster vol heal volname info` etc
and do not cause heals. It would be better to check your client mount
logs for disconnect messages like these:

[2018-08-16 03:59:32.289763] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-testvol-client-4: disconnected from
testvol-client-0. Client process will keep trying to connect to glusterd
until brick's port is available

If there are no disconnects and you are still seeing files undergoing
heal, then you might want to check the brick logs to see if there are
any write failures.
Thanks,
Ravi
Post by Pablo Schandin
Thanks!
Pablo.
Avature
Get Engaged to Talent
Post by Pablo Schandin
Thanks for the info!
I cannot see any logs in the mount log besides one line every time it
rotates
[2018-08-13 06:25:02.246187] I
[glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in
volfile,continuing
But I did find in the glfsheal-gv1.log of the volumes some kind of
server-client connection that was disconnected and now it connects
using a different port. The block of log per each run is kind of long
so I'm copying it into a pastebin.
https://pastebin.com/bp06rrsT
Maybe this has something to do with it?
Thanks!
Pablo.
Post by Ravishankar N
Post by Pablo Schandin
Hello everyone!
I'm having some trouble with something but I'm not quite sure of
with what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have
two servers (nodes) in the cluster in a replica mode. Each server
has 2 bricks. As the servers are KVM running several VMs, one brick
has some VMs locally defined in it and the second brick is the
replicated from the other server. It has data but not actual
writing is being done except for the replication.
                            Server 1                              
                  Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) ---->           
      Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-----           
     Brick 2 defined VMs (read/write)
So, the main issue arose when I got a nagios alarm that warned
about a file listed to be healed. And then it disappeared. I came
to find out that every 5 minutes, the self heal daemon triggers the
healing and this fixes it. But looking at the logs I have a lot of
[2018-08-09 14:23:37.689403] I [MSGID: 108026]
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c.
sources=[0] sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026]
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f.
sources=[0] sinks=1
The qcow2 files are being healed several times a day (up to 30 in
occasions). As I understand, this means that a data heal occurred
on file with gfid 407b... and 7371... in source to sink. Local
server to replica server? Is it OK for the shd to heal files in the
replicated brick that supposedly has no writing on it besides the
mirroring? How does that work?
In AFR, for writes, there is no notion of local/remote brick. No
matter from which client you write to the volume, it gets sent to
both bricks. i.e. the replication is synchronous and real time.
Post by Pablo Schandin
How does afr replication work? The file with gfid 7371... is the
qcow2 root disk of an owncloud server with 17GB of data. It does
not seem to be that big to be a bottleneck of some sort, I think.
Also, I was investigating the directory tree in
brick/.glusterfs/indices and I notices that both in xattrop and
dirty I always have a file created named xattrop-xxxxxx and
dirty-xxxxxx. I read that the xattrop file is like a parent file or
handle to reference other files created there as hardlinks with
gfid name for the shd to heal. Is the same case as the ones in the
dirty dir?
Yes, before the write, the gfid gets captured inside dirty on all
bricks. If the write is successful, it gets removed. In addition, if
the write fails on one brick, the other brick will capture the gfid
inside xattrop.
Post by Pablo Schandin
Any help will be greatly appreciated it. Thanks!
If frequent heals are triggered, it could mean there are frequent
network disconnects from the clients to the bricks as writes happen.
You can check the mount logs and see if that is the case and
investigate possible network issues.
HTH,
Ravi
Post by Pablo Schandin
Pablo.
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Pablo Schandin
2018-08-21 14:10:55 UTC
Permalink
I couldn't find any disconnections yet. We analyzed the port's traffic
to see if there was too much data going through, but that was OK. I also
cannot see any other disconnections so for now we will continue to check
the network because I might have missed something.

Thanks for all the help! If I have any other news I will let you know.

Pablo.
Post by Ravishankar N
Post by Pablo Schandin
I found another log that I wasn't aware of in
/var/log/glusterfs/brick, that is te mount log, I confused the log
[2018-08-15 16:41:19.568477] I [addr.c:55:compare_addr_and_update]
0-/mnt/brick1/gv1: allowed = "172.20.36.10", received addr =
"172.20.36.11"
[2018-08-15 16:41:19.568527] I [addr.c:55:compare_addr_and_update]
0-/mnt/brick1/gv1: allowed = "172.20.36.11", received addr =
"172.20.36.11"
allowed user names: 7107ccfa-0ba1-4172-aa5a-031568927bf1
[2018-08-15 16:41:19.568564] I [MSGID: 115029]
[server-handshake.c:793:server_setvolume] 0-gv1-server: accepted
client from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
(version: 3.1
2.6)
[2018-08-15 16:41:19.582710] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-gv1-server: disconnecting
connection from
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
[2018-08-15 16:41:19.582830] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down
connection
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
So I see a lot of disconnections, right? This might be why the self
healing is triggered all the time?
Not necessarily. These disconnects could also be due to the glfsheal
binary which is invoked when you run `gluster vol heal volname info`
etc and do not cause heals. It would be better to check your client
[2018-08-16 03:59:32.289763] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-testvol-client-4: disconnected
from testvol-client-0. Client process will keep trying to connect to
glusterd until brick's port is available
If there are no disconnects and you are still seeing files undergoing
heal, then you might want to check the brick logs to see if there are
any write failures.
Thanks,
Ravi
Post by Pablo Schandin
Thanks!
Pablo.
Avature
Get Engaged to Talent
Post by Pablo Schandin
Thanks for the info!
I cannot see any logs in the mount log besides one line every time
it rotates
[2018-08-13 06:25:02.246187] I
[glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in
volfile,continuing
But I did find in the glfsheal-gv1.log of the volumes some kind of
server-client connection that was disconnected and now it connects
using a different port. The block of log per each run is kind of
long so I'm copying it into a pastebin.
https://pastebin.com/bp06rrsT
Maybe this has something to do with it?
Thanks!
Pablo.
Post by Ravishankar N
Post by Pablo Schandin
Hello everyone!
I'm having some trouble with something but I'm not quite sure of
with what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I
have two servers (nodes) in the cluster in a replica mode. Each
server has 2 bricks. As the servers are KVM running several VMs,
one brick has some VMs locally defined in it and the second brick
is the replicated from the other server. It has data but not
actual writing is being done except for the replication.
                            Server 1                              
                  Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) ---->           
      Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-----           
     Brick 2 defined VMs (read/write)
So, the main issue arose when I got a nagios alarm that warned
about a file listed to be healed. And then it disappeared. I came
to find out that every 5 minutes, the self heal daemon triggers
the healing and this fixes it. But looking at the logs I have a
[2018-08-09 14:23:37.689403] I [MSGID: 108026]
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c.
sources=[0] sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026]
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f.
sources=[0] sinks=1
The qcow2 files are being healed several times a day (up to 30 in
occasions). As I understand, this means that a data heal occurred
on file with gfid 407b... and 7371... in source to sink. Local
server to replica server? Is it OK for the shd to heal files in
the replicated brick that supposedly has no writing on it besides
the mirroring? How does that work?
In AFR, for writes, there is no notion of local/remote brick. No
matter from which client you write to the volume, it gets sent to
both bricks. i.e. the replication is synchronous and real time.
Post by Pablo Schandin
How does afr replication work? The file with gfid 7371... is the
qcow2 root disk of an owncloud server with 17GB of data. It does
not seem to be that big to be a bottleneck of some sort, I think.
Also, I was investigating the directory tree in
brick/.glusterfs/indices and I notices that both in xattrop and
dirty I always have a file created named xattrop-xxxxxx and
dirty-xxxxxx. I read that the xattrop file is like a parent file
or handle to reference other files created there as hardlinks with
gfid name for the shd to heal. Is the same case as the ones in the
dirty dir?
Yes, before the write, the gfid gets captured inside dirty on all
bricks. If the write is successful, it gets removed. In addition,
if the write fails on one brick, the other brick will capture the
gfid inside xattrop.
Post by Pablo Schandin
Any help will be greatly appreciated it. Thanks!
If frequent heals are triggered, it could mean there are frequent
network disconnects from the clients to the bricks as writes
happen. You can check the mount logs and see if that is the case
and investigate possible network issues.
HTH,
Ravi
Post by Pablo Schandin
Pablo.
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Loading...