[Gluster-users] gluster connection interrupted during transfer

Discussion:

Richard Neuboeck

2018-08-29 12:41:08 UTC

Hi Gluster Community,

I have problems with a glusterfs 'Transport endpoint not connected'
connection abort during file transfers that I can replicate (all the
time now) but not pinpoint as to why this is happening.

The volume is set up in replica 3 mode and accessed with the fuse
gluster client. Both client and server are running CentOS and the
supplied 3.12.11 version of gluster.

The connection abort happens at different times during rsync but
occurs every time I try to sync all our files (1.1TB) to the empty
volume.

Client and server side I don't find errors in the gluster log files.
rsync logs the obvious transfer problem. The only log that shows
anything related is the server brick log which states that the
connection is shutting down:

[2018-08-18 22:40:35.502510] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-home-server: disconnecting
connection from brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
[2018-08-18 22:40:35.502620] W
[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
[2018-08-18 22:40:35.502692] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.502719] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.505950] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-home-server: Shutting down
connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0

Since I'm running another replica 3 setup for oVirt for a long time
now which is completely stable I thought I made a mistake setting
different options at first. However even when I reset those options
I'm able to reproduce the connection problem.

The unoptimized volume setup looks like this:

Volume Name: home
Type: Replicate
Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sphere-four:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-six:/srv/gluster_home/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 50%

The following additional options were used before:

performance.cache-size: 5GB
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 50000
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
performance.parallel-readdir: on

In this case the gluster servers and also the client is using a
bonded network device running in adaptive load balancing mode.

I've tried using the debug option for the client mount. But except
for a ~0.5TB log file I didn't get information that seems helpful to me.

Transferring just a couple of GB works without problems.

It may very well be that I'm already blind to the obvious but after
many long running tests I can't find the crux in the setup.

Does anyone have an idea as how to approach this problem in a way
that sheds some useful information?

Any help is highly appreciated!
Cheers
Richard

--
/dev/null

Nithya Balachandran

2018-08-30 07:45:07 UTC

Permalink

Hi Richard,

Is this setup running with the same gluster version and on the same nodes
or is it a different cluster?

Post by Richard Neuboeck
now which is completely stable I thought I made a mistake setting
different options at first. However even when I reset those options
I'm able to reproduce the connection problem.
Volume Name: home
Type: Replicate
Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: sphere-four:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-six:/srv/gluster_home/brick
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 50%
performance.cache-size: 5GB
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 50000
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
performance.parallel-readdir: on
In this case the gluster servers and also the client is using a
bonded network device running in adaptive load balancing mode.
I've tried using the debug option for the client mount. But except
for a ~0.5TB log file I didn't get information that seems helpful to me.
Transferring just a couple of GB works without problems.
It may very well be that I'm already blind to the obvious but after
many long running tests I can't find the crux in the setup.
Does anyone have an idea as how to approach this problem in a way
that sheds some useful information?
Any help is highly appreciated!
Cheers
Richard
--
/dev/null
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users

Richard Neuboeck

2018-08-30 11:48:39 UTC

Permalink

Hi Nithya,

Post by Nithya Balachandran
Hi Richard,
Hi Gluster Community,
I have problems with a glusterfs 'Transport endpoint not connected'
connection abort during file transfers that I can replicate (all the
time now) but not pinpoint as to why this is happening.
The volume is set up in replica 3 mode and accessed with the fuse
gluster client. Both client and server are running CentOS and the
supplied 3.12.11 version of gluster.
The connection abort happens at different times during rsync but
occurs every time I try to sync all our files (1.1TB) to the empty
volume.
Client and server side I don't find errors in the gluster log files.
rsync logs the obvious transfer problem. The only log that shows
anything related is the server brick log which states that the
[2018-08-18 22:40:35.502510] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-home-server: disconnecting
connection from
brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
[2018-08-18 22:40:35.502620] W
[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
[2018-08-18 22:40:35.502692] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.502719] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.505950] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-home-server: Shutting down
connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0Â
Since I'm running another replica 3 setup for oVirt for a long time
Is this setup running with the same gluster version and on the same
nodes or is it a different cluster?

It's a different cluster (sphere-one, sphere-two and sphere-three)
but the same gluster version and basically the same hardware.

Cheers
Richard

Post by Nithya Balachandran
Â
now which is completely stable I thought I made a mistake setting
different options at first. However even when I reset those options
I'm able to reproduce the connection problem.
The unoptimized volume setup looks like this:Â
Volume Name: home
Type: Replicate
Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: sphere-four:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-six:/srv/gluster_home/brick
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 50%
performance.cache-size: 5GB
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 50000
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
performance.parallel-readdir: on
In this case the gluster servers and also the client is using a
bonded network device running in adaptive load balancing mode.
I've tried using the debug option for the client mount. But except
for a ~0.5TB log file I didn't get information that seems
helpful to me.
Transferring just a couple of GB works without problems.
It may very well be that I'm already blind to the obvious but after
many long running tests I can't find the crux in the setup.
Does anyone have an idea as how to approach this problem in a way
that sheds some useful information?
Any help is highly appreciated!
Cheers
Richard
--
/dev/null
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>

--
/dev/null

Raghavendra Gowdappa

2018-08-30 12:40:34 UTC

Permalink

Normally client logs will give a clue on why the disconnections are
happening (ping-timeout, wrong port etc). Can you look into client logs to
figure out what's happening? If you can't find anything, can you send
across client logs?

Post by Richard Neuboeck
Hi Gluster Community,
I have problems with a glusterfs 'Transport endpoint not connected'
connection abort during file transfers that I can replicate (all the
time now) but not pinpoint as to why this is happening.
The volume is set up in replica 3 mode and accessed with the fuse
gluster client. Both client and server are running CentOS and the
supplied 3.12.11 version of gluster.
The connection abort happens at different times during rsync but
occurs every time I try to sync all our files (1.1TB) to the empty
volume.
Client and server side I don't find errors in the gluster log files.
rsync logs the obvious transfer problem. The only log that shows
anything related is the server brick log which states that the
[2018-08-18 22:40:35.502510] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-home-server: disconnecting
connection from brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
[2018-08-18 22:40:35.502620] W
[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
[2018-08-18 22:40:35.502692] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.502719] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.505950] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-home-server: Shutting down
connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
Since I'm running another replica 3 setup for oVirt for a long time
now which is completely stable I thought I made a mistake setting
different options at first. However even when I reset those options
I'm able to reproduce the connection problem.
Volume Name: home
Type: Replicate
Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: sphere-four:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-six:/srv/gluster_home/brick
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 50%
performance.cache-size: 5GB
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 50000
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
performance.parallel-readdir: on
In this case the gluster servers and also the client is using a
bonded network device running in adaptive load balancing mode.
I've tried using the debug option for the client mount. But except
for a ~0.5TB log file I didn't get information that seems helpful to me.
Transferring just a couple of GB works without problems.
It may very well be that I'm already blind to the obvious but after
many long running tests I can't find the crux in the setup.
Does anyone have an idea as how to approach this problem in a way
that sheds some useful information?
Any help is highly appreciated!
Cheers
Richard
--
/dev/null
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users

Richard Neuboeck

2018-08-30 13:34:31 UTC

Permalink

Hi,

I'm attaching a shortened version since the whole is about 5.8GB of
the client mount log. It includes the initial mount messages and the
last two minutes of log entries.

It ends very anticlimactic without an obvious error. Is there
anything specific I should be looking for?

Cheers
Richard

Post by Raghavendra Gowdappa
Normally client logs will give a clue on why the disconnections are
happening (ping-timeout, wrong port etc). Can you look into client
logs to figure out what's happening? If you can't find anything, can
you send across client logs?
On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck
Hi Gluster Community,
I have problems with a glusterfs 'Transport endpoint not connected'
connection abort during file transfers that I can replicate (all the
time now) but not pinpoint as to why this is happening.
The volume is set up in replica 3 mode and accessed with the fuse
gluster client. Both client and server are running CentOS and the
supplied 3.12.11 version of gluster.
The connection abort happens at different times during rsync but
occurs every time I try to sync all our files (1.1TB) to the empty
volume.
Client and server side I don't find errors in the gluster log files.
rsync logs the obvious transfer problem. The only log that shows
anything related is the server brick log which states that the
[2018-08-18 22:40:35.502510] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-home-server: disconnecting
connection from
brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
[2018-08-18 22:40:35.502620] W
[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
[2018-08-18 22:40:35.502692] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.502719] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.505950] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-home-server: Shutting down
connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
Since I'm running another replica 3 setup for oVirt for a long time
now which is completely stable I thought I made a mistake setting
different options at first. However even when I reset those options
I'm able to reproduce the connection problem.
Volume Name: home
Type: Replicate
Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: sphere-four:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-six:/srv/gluster_home/brick
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 50%
performance.cache-size: 5GB
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 50000
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
performance.parallel-readdir: on
In this case the gluster servers and also the client is using a
bonded network device running in adaptive load balancing mode.
I've tried using the debug option for the client mount. But except
for a ~0.5TB log file I didn't get information that seems
helpful to me.
Transferring just a couple of GB works without problems.
It may very well be that I'm already blind to the obvious but after
many long running tests I can't find the crux in the setup.
Does anyone have an idea as how to approach this problem in a way
that sheds some useful information?
Any help is highly appreciated!
Cheers
Richard
--
/dev/null
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>

--
/dev/null

Raghavendra Gowdappa

2018-08-31 01:50:25 UTC

Permalink

+Mohit. +Milind

@Mohit/Milind,

Can you check logs and see whether you can find anything relevant?

Post by Richard Neuboeck
Hi,
I'm attaching a shortened version since the whole is about 5.8GB of
the client mount log. It includes the initial mount messages and the
last two minutes of log entries.
It ends very anticlimactic without an obvious error. Is there
anything specific I should be looking for?

Normally I look logs around disconnect msgs to find out the reason. But as
you said, sometimes one can see just disconnect msgs without any reason.
That normally points to reason for disconnect in the network rather than a
Glusterfs initiated disconnect.

Post by Richard Neuboeck
Cheers
Richard

Post by Raghavendra Gowdappa
Normally client logs will give a clue on why the disconnections are
happening (ping-timeout, wrong port etc). Can you look into client
logs to figure out what's happening? If you can't find anything, can
you send across client logs?
On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck
Hi Gluster Community,
I have problems with a glusterfs 'Transport endpoint not connected'
connection abort during file transfers that I can replicate (all the
time now) but not pinpoint as to why this is happening.
The volume is set up in replica 3 mode and accessed with the fuse
gluster client. Both client and server are running CentOS and the
supplied 3.12.11 version of gluster.
The connection abort happens at different times during rsync but
occurs every time I try to sync all our files (1.1TB) to the empty
volume.
Client and server side I don't find errors in the gluster log files.
rsync logs the obvious transfer problem. The only log that shows
anything related is the server brick log which states that the
[2018-08-18 22:40:35.502510] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-home-server: disconnecting
connection from
brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
[2018-08-18 22:40:35.502620] W
[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
[2018-08-18 22:40:35.502692] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.502719] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.505950] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-home-server: Shutting down
connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
Since I'm running another replica 3 setup for oVirt for a long time
now which is completely stable I thought I made a mistake setting
different options at first. However even when I reset those options
I'm able to reproduce the connection problem.
Volume Name: home
Type: Replicate
Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: sphere-four:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-six:/srv/gluster_home/brick
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 50%
performance.cache-size: 5GB
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 50000
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
performance.parallel-readdir: on
In this case the gluster servers and also the client is using a
bonded network device running in adaptive load balancing mode.
I've tried using the debug option for the client mount. But except
for a ~0.5TB log file I didn't get information that seems helpful to me.
Transferring just a couple of GB works without problems.
It may very well be that I'm already blind to the obvious but after
many long running tests I can't find the crux in the setup.
Does anyone have an idea as how to approach this problem in a way
that sheds some useful information?
Any help is highly appreciated!
Cheers
Richard
--
/dev/null
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>

--
/dev/null

Richard Neuboeck

2018-08-31 05:41:37 UTC

Permalink

Post by Raghavendra Gowdappa
+Mohit. +Milind
@Mohit/Milind,
Can you check logs and see whether you can find anything relevant?

From glances at the system logs nothing out of the ordinary
occurred. However I'll start another rsync and take a closer look.
It will take a few days.

Post by Raghavendra Gowdappa
On Thu, Aug 30, 2018 at 7:04 PM, Richard Neuboeck
Hi,
I'm attaching a shortened version since the whole is about 5.8GB of
the client mount log. It includes the initial mount messages and the
last two minutes of log entries.
It ends very anticlimactic without an obvious error. Is there
anything specific I should be looking for?
Normally I look logs around disconnect msgs to find out the reason.
But as you said, sometimes one can see just disconnect msgs without
any reason. That normally points to reason for disconnect in the
network rather than a Glusterfs initiated disconnect.

The rsync source is serving our homes currently so there are NFS
connections 24/7. There don't seem to be any network related
interruptions - a co-worker would be here faster than I could check
the logs if the connection to home would be broken ;-)
The three gluster machines are due to this problem reduced to only
testing so there is nothing else running.

Post by Raghavendra Gowdappa
Cheers
Richard

connected'

Post by Raghavendra Gowdappa
Â Â Â connection abort during file transfers that I can

replicate (all the

Post by Raghavendra Gowdappa
Â Â Â time now) but not pinpoint as to why this is happening.
Â Â Â The volume is set up in replica 3 mode and accessed with

the fuse

Post by Raghavendra Gowdappa
Â Â Â gluster client. Both client and server are running CentOS

and the

Post by Raghavendra Gowdappa
Â Â Â supplied 3.12.11 version of gluster.
Â Â Â The connection abort happens at different times during

rsync but

Post by Raghavendra Gowdappa
Â Â Â occurs every time I try to sync all our files (1.1TB) to

the empty

Post by Raghavendra Gowdappa
Â Â Â volume.
Â Â Â Client and server side I don't find errors in the gluster

log files.

Post by Raghavendra Gowdappa
Â Â Â rsync logs the obvious transfer problem. The only log that

shows

Post by Raghavendra Gowdappa
Â Â Â anything related is the server brick log which states that the
Â Â Â [2018-08-18 22:40:35.502510] I [MSGID: 115036]
Â Â Â [server.c:527:server_rpc_notify] 0-home-server: disconnecting
Â Â Â connection from
Â Â Â brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
Â Â Â [2018-08-18 22:40:35.502620] W

releasing lock

Post by Raghavendra Gowdappa
Â Â Â on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
Â Â Â {client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
Â Â Â [2018-08-18 22:40:35.502692] W

releasing lock

Post by Raghavendra Gowdappa
Â Â Â on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
Â Â Â {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
Â Â Â [2018-08-18 22:40:35.502719] W

releasing lock

Post by Raghavendra Gowdappa
Â Â Â on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
Â Â Â {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
Â Â Â [2018-08-18 22:40:35.505950] I [MSGID: 101055]
Â Â Â [client_t.c:443:gf_client_unref] 0-home-server: Shutting down
Â Â Â connection

brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0

Post by Raghavendra Gowdappa
Â Â Â Since I'm running another replica 3 setup for oVirt for a

long time

Post by Raghavendra Gowdappa
Â Â Â now which is completely stable I thought I made a mistake

setting

Post by Raghavendra Gowdappa
Â Â Â different options at first. However even when I reset

those options

Post by Raghavendra Gowdappa
Â Â Â I'm able to reproduce the connection problem.
Â Â Â Volume Name: home
Â Â Â Type: Replicate
Â Â Â Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Â Â Â Status: Started
Â Â Â Snapshot Count: 0
Â Â Â Number of Bricks: 1 x 3 = 3
Â Â Â Transport-type: tcp
Â Â Â Brick1: sphere-four:/srv/gluster_home/brick
Â Â Â Brick2: sphere-five:/srv/gluster_home/brick
Â Â Â Brick3: sphere-six:/srv/gluster_home/brick
Â Â Â nfs.disable: on
Â Â Â transport.address-family: inet
Â Â Â cluster.quorum-type: auto
Â Â Â cluster.server-quorum-type: server
Â Â Â cluster.server-quorum-ratio: 50%
Â Â Â performance.cache-size: 5GB
Â Â Â client.event-threads: 4
Â Â Â server.event-threads: 4
Â Â Â cluster.lookup-optimize: on
Â Â Â features.cache-invalidation: on
Â Â Â performance.stat-prefetch: on
Â Â Â performance.cache-invalidation: on
Â Â Â network.inode-lru-limit: 50000
Â Â Â features.cache-invalidation-timeout: 600
Â Â Â performance.md-cache-timeout: 600
Â Â Â performance.parallel-readdir: on
Â Â Â In this case the gluster servers and also the client is

using a

Post by Raghavendra Gowdappa
Â Â Â bonded network device running in adaptive load balancing mode.
Â Â Â I've tried using the debug option for the client mount.

But except

Post by Raghavendra Gowdappa
Â Â Â for a ~0.5TB log file I didn't get information that seems
Â Â Â helpful to me.
Â Â Â Transferring just a couple of GB works without problems.
Â Â Â It may very well be that I'm already blind to the obvious

but after

Post by Raghavendra Gowdappa
Â Â Â many long running tests I can't find the crux in the setup.
Â Â Â Does anyone have an idea as how to approach this problem

in a way

Post by Raghavendra Gowdappa
Â Â Â that sheds some useful information?
Â Â Â Any help is highly appreciated!
Â Â Â Cheers
Â Â Â Richard
Â Â Â --
Â Â Â /dev/null
Â Â Â _______________________________________________
Â Â Â Gluster-users mailing list
Â Â Â https://lists.gluster.org/mailman/listinfo/gluster-users

<https://lists.gluster.org/mailman/listinfo/gluster-users>

Post by Raghavendra Gowdappa
Â Â Â <https://lists.gluster.org/mailman/listinfo/gluster-users

<https://lists.gluster.org/mailman/listinfo/gluster-users>>
--
/dev/null

--
/dev/null

Raghavendra Gowdappa

2018-08-31 06:13:59 UTC