[Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Discussion:

Marcus Pedersén

2018-07-11 20:19:53 UTC

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was started successfully.

Status again .... Stopped

Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.

Please help me get the geo-replication up and running again.

Best regards

Marcus Pedersén

Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
[2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
[2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
except:
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
sys.exit()
TypeError: 'int' object is not iterable
[2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Kotresh Hiremath Ravishankar

2018-07-12 05:41:36 UTC

Permalink

Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus PedersÃ©n <***@slu.se>
wrote:

> Hi all,
>
> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade
> instructions for offline upgrade.
>
> I upgraded geo-replication side first 1 x (2+1) and the master side after
> that 2 x (2+1).
>
> Both clusters works the way they should on their own.
>
> After upgrade on master side status for all geo-replication nodes
> is Stopped.
>
> I tried to start the geo-replication from master node and response back
> was started successfully.
>
> Status again .... Stopped
>
> Tried to start again and get response started successfully, after that all
> glusterd crashed on all master nodes.
>
> After a restart of all glusterd the master cluster was up again.
>
> Status for geo-replication is still Stopped and every try to start it
> after this gives the response successful but still status Stopped.
>
>
> Please help me get the geo-replication up and running again.
>
>
> Best regards
>
> Marcus PedersÃ©n
>
>
> Part of geo-replication log from master node:
>
> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> <top>: connection to peer is broken
> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/
> 7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume error=2
> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> {monitor-status,monitor,
> worker,agent,slave,status,config-check,config-get,config-set,config-reset,
> voluuidget,d\
> elete}
> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> ...
> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
> 'monit\
> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
> 'config-set', 'config-reset', 'voluuidget', 'delete')
> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster
> slave_node=ssh://***@urd-gds-geo-000:gluster://
> localhost:urd-gds-volume
> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> <top>: connection to peer is broken
> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/
> 7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume error=2
> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> {monitor-status,monitor,
> worker,agent,slave,status,config-check,config-get,config-set,config-reset,
> voluuidget,d\
> elete}
> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> ...
> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
> 'monit\
> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
> 'config-set', 'config-reset', 'voluuidget', 'delete')
> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster
> slave_node=ssh://***@urd-gds-geo-000:gluster://
> localhost:urd-gds-volume
> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor:
> Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status]
> GeorepStatus: Worker Status Change status=inconsistent
> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception]
> <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 361, in twrap
> except:
> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428,
> in wmon
> sys.exit()
> TypeError: 'int' object is not iterable
> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>:
> exiting.
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

--
Thanks and Regards,
Kotresh H R

Marcus Pedersén

2018-07-12 06:51:38 UTC

Permalink

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus

################
Marcus PedersÃ©n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus PedersÃ©n <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was started successfully.

Status again .... Stopped

Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.

Please help me get the geo-replication up and running again.

Best regards

Marcus PedersÃ©n

Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
[2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
[2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
except:
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
sys.exit()
TypeError: 'int' object is not iterable
[2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-07-12 19:56:03 UTC

Permalink

Hi Kotresh,

i have replaced both files (gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py> and repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>) in all nodes both master and slave.

I rebooted all servers but geo-replication status is still Stopped.

I tried to start geo-replication with response Successful but status still show Stopped on all nodes.

Nothing has been written to geo-replication logs since I sent the tail of the log.

So I do not know what info to provide?

Please, help me to find a way to solve this.

Thanks!

Regards

Marcus

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 12 juli 2018 08:51
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was started successfully.

Status again .... Stopped

Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.

Please help me get the geo-replication up and running again.

Best regards

Marcus Pedersén

Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
[2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
[2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
except:
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
sys.exit()
TypeError: 'int' object is not iterable
[2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Kotresh Hiremath Ravishankar

2018-07-13 09:28:26 UTC

Permalink

Hi Marcus,

Is the gluster geo-rep version is same on both master and slave?

Thanks,
Kotresh HR

On Fri, Jul 13, 2018 at 1:26 AM, Marcus PedersÃ©n <***@slu.se>
wrote:

> Hi Kotresh,
>
> i have replaced both files (gsyncdconfig.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
> and repce.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
> in all nodes both master and slave.
>
> I rebooted all servers but geo-replication status is still Stopped.
>
> I tried to start geo-replication with response Successful but status still
> show Stopped on all nodes.
>
> Nothing has been written to geo-replication logs since I sent the tail of
> the log.
>
> So I do not know what info to provide?
>
>
> Please, help me to find a way to solve this.
>
>
> Thanks!
>
>
> Regards
>
> Marcus
>
>
> ------------------------------
> *FrÃ¥n:* gluster-users-***@gluster.org <gluster-users-bounces@
> gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> *Skickat:* den 12 juli 2018 08:51
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-***@gluster.org
> *Ãmne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Thanks Kotresh,
> I installed through the official centos channel, centos-release-gluster41.
> Isn't this fix included in centos install?
> I will have a look, test it tonight and come back to you!
>
> Thanks a lot!
>
> Regards
> Marcus
>
> ################
> Marcus PedersÃ©n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <
> ***@redhat.com>:
>
> Hi Marcus,
>
> I think the fix [1] is needed in 4.1
> Could you please this out and let us know if that works for you?
>
> [1] https://review.gluster.org/#/c/20207/
>
> Thanks,
> Kotresh HR
>
> On Thu, Jul 12, 2018 at 1:49 AM, Marcus PedersÃ©n <***@slu.se>
> wrote:
>
> Hi all,
>
> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade
> instructions for offline upgrade.
>
> I upgraded geo-replication side first 1 x (2+1) and the master side after
> that 2 x (2+1).
>
> Both clusters works the way they should on their own.
>
> After upgrade on master side status for all geo-replication nodes
> is Stopped.
>
> I tried to start the geo-replication from master node and response back
> was started successfully.
>
> Status again .... Stopped
>
> Tried to start again and get response started successfully, after that all
> glusterd crashed on all master nodes.
>
> After a restart of all glusterd the master cluster was up again.
>
> Status for geo-replication is still Stopped and every try to start it
> after this gives the response successful but still status Stopped.
>
>
> Please help me get the geo-replication up and running again.
>
>
> Best regards
>
> Marcus PedersÃ©n
>
>
> Part of geo-replication log from master node:
>
> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> <top>: connection to peer is broken
> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5
> 534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume error=2
> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> {monitor-status,monitor,worker
> ,agent,slave,status,config-check,config-get,config-set,
> config-reset,voluuidget,d\
> elete}
> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> ...
> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
> 'monit\
> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
> 'config-set', 'config-reset', 'voluuidget', 'delete')
> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster
> slave_node=ssh://***@urd-gds-geo-000:gluster://localhost
> :urd-gds-volume
> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> <top>: connection to peer is broken
> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5
> 534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume error=2
> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> {monitor-status,monitor,worker
> ,agent,slave,status,config-check,config-get,config-set,
> config-reset,voluuidget,d\
> elete}
> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> ...
> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
> 'monit\
> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
> 'config-set', 'config-reset', 'voluuidget', 'delete')
> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster
> slave_node=ssh://***@urd-gds-geo-000:gluster://localhost
> :urd-gds-volume
> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor:
> Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status]
> GeorepStatus: Worker Status Change status=inconsistent
> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception]
> <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 361, in twrap
> except:
> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428,
> in wmon
> sys.exit()
> TypeError: 'int' object is not iterable
> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>:
> exiting.
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>

--
Thanks and Regards,
Kotresh H R

Marcus Pedersén

2018-07-13 19:30:34 UTC

Permalink

Hi again,

I made a mistake when replacing the python files, I missed selinux context, I fixed this but it makes no difference.

All nodes in geo-replication is still in status Stopped, and at a start response is successful but still status Stopped.

I enclose glusterd.log and gsyncd.log and hope that this can give something.

Many thanks for your help!

Best regards

Marcus Pedesén

________________________________
Från: Marcus Pedersén
Skickat: den 13 juli 2018 14:50
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Kotresh,
Yes, all nodes have the same version 4.1.1 both master and slave.
All glusterd are crashing on the master side.
Will send logs tonight.

Thanks,
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Hi Marcus,

Is the gluster geo-rep version is same on both master and slave?

Thanks,
Kotresh HR

On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi Kotresh,

i have replaced both files (gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py> and repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>) in all nodes both master and slave.

I rebooted all servers but geo-replication status is still Stopped.

I tried to start geo-replication with response Successful but status still show Stopped on all nodes.

Nothing has been written to geo-replication logs since I sent the tail of the log.

So I do not know what info to provide?

Please, help me to find a way to solve this.

Thanks!

Regards

Marcus

________________________________
Från: gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org> <gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org>> för Marcus Pedersén <***@slu.se<mailto:***@slu.se>>
Skickat: den 12 juli 2018 08:51
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was started successfully.

Status again .... Stopped

Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.

Please help me get the geo-replication up and running again.

Best regards

Marcus Pedersén

Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
[2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
[2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
except:
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
sys.exit()
TypeError: 'int' object is not iterable
[2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-07-16 19:59:31 UTC

Permalink

Hi Kotresh,

I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:

/var/log/glusterfs/cli.log

I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.

Starting geo-replication after that gives response successful but all nodes get status Faulty.

If I run: gluster-mountbroker status

I get:

+-----------------------------+-------------+---------------------------+--------------+--------------------------+
| NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
+-----------------------------+-------------+---------------------------+--------------+--------------------------+
| urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
| urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
| localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
+-----------------------------+-------------+---------------------------+--------------+--------------------------+

and that is all nodes on slave cluster, so mountbroker seems ok.

gsyncd.log logs an error about /usr/local/sbin/gluster is missing.

That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster

Another error is that SSH between master and slave is broken,

but now when I have changed permission on /var/log/glusterfs/cli.log I can run:

ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume

as geouser and that works, which means that the ssh connection works.

Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?

Is gluster supposed to be in /usr/local/sbin/gluster?

Do I have any options or should I remove current geo-replication and create a new?

How much do I need to clean up before creating a new geo-replication?

In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?

Many thanks in advance!

Marcus Pedersén

Part from the gsyncd.log:

[2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
[2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
[2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
[2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 13 juli 2018 14:50
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Kotresh,
Yes, all nodes have the same version 4.1.1 both master and slave.
All glusterd are crashing on the master side.
Will send logs tonight.

Thanks,
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Hi Marcus,

Is the gluster geo-rep version is same on both master and slave?

Thanks,
Kotresh HR

On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi Kotresh,

i have replaced both files (gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py> and repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>) in all nodes both master and slave.

I rebooted all servers but geo-replication status is still Stopped.

I tried to start geo-replication with response Successful but status still show Stopped on all nodes.

Nothing has been written to geo-replication logs since I sent the tail of the log.

So I do not know what info to provide?

Please, help me to find a way to solve this.

Thanks!

Regards

Marcus

________________________________
Från: gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org> <gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org>> för Marcus Pedersén <***@slu.se<mailto:***@slu.se>>
Skickat: den 12 juli 2018 08:51
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was started successfully.

Status again .... Stopped

Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.

Please help me get the geo-replication up and running again.

Best regards

Marcus Pedersén

Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
[2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
[2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
except:
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
sys.exit()
TypeError: 'int' object is not iterable
[2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Kotresh Hiremath Ravishankar

2018-07-18 03:58:48 UTC

Permalink

Hi Marcus,

I am testing out 4.1 myself and I will have some update today.
For this particular traceback, gsyncd is not able to find the library.
Is it the rpm install? If so, gluster libraries would be in /usr/lib.
Please run the cmd below.

#ldconfig /usr/lib
#ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)

Geo-rep should be fixed automatically.

Thanks,
Kotresh HR

On Wed, Jul 18, 2018 at 1:27 AM, Marcus PedersÃ©n <***@slu.se>
wrote:

> Hi again,
>
> I continue to do some testing, but now I have come to a stage where I need
> help.
>
>
> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing
> so I made a link.
>
> After that /usr/local/sbin/glusterfs was missing so I made a link there as
> well.
>
> Both links were done on all slave nodes.
>
>
> Now I have a new error that I can not resolve myself.
>
> It can not open libgfchangelog.so
>
>
> Many thanks!
>
> Regards
>
> Marcus PedersÃ©n
>
>
> Part of gsyncd.log:
>
> OSError: libgfchangelog.so: cannot open shared object file: No such file
> or directory
> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-17 19:32:17.542363] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote]
> SSH: SSH connection between master and slave established.
> duration=1.6151
> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect]
> GLUSTER: Mounting gluster volume locally...
> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect]
> GLUSTER: Mounted gluster volume duration=1.0901
> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker]
> <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker]
> <top>: call failed:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in
> worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 37, in init
> return Changes.cl_init()
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 21, in __getattr__
> from libgfchangelog import Changes as LChanges
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 17, in <module>
> class Changes(object):
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 19, in Changes
> use_errno=True)
> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> self._handle = _dlopen(self._name, mode)
> OSError: libgfchangelog.so: cannot open shared object file: No such file
> or directory
> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__]
> RepceClient: call failed call=6078:139982918485824:1531855940.27
> method=init error=OSError
> [2018-07-17 19:32:20.275192] E [syncdutils(worker
> /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in
> main
> func(args)
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in
> subcmd_worker
> local.service_loop(remote)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236,
> in service_loop
> changelog_agent.init()
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in
> __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in
> __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file
> or directory
> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor:
> worker died in startup phase brick=/urd-gds/gluster
>
>
>
> ------------------------------
> *FrÃ¥n:* gluster-users-***@gluster.org <gluster-users-bounces@
> gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> *Skickat:* den 16 juli 2018 21:59
> *Till:* ***@redhat.com
>
> *Kopia:* gluster-***@gluster.org
> *Ãmne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
>
> Hi Kotresh,
>
> I have been testing for a bit and as you can see from the logs I sent
> before permission is denied for geouser on slave node on file:
>
> /var/log/glusterfs/cli.log
>
> I have turned selinux off and just for testing I changed permissions on
> /var/log/glusterfs/cli.log so geouser can access it.
>
> Starting geo-replication after that gives response successful but all
> nodes get status Faulty.
>
>
> If I run: gluster-mountbroker status
>
> I get:
>
> +-----------------------------+-------------+---------------
> ------------+--------------+--------------------------+
> | NODE | NODE STATUS | MOUNT ROOT
> | GROUP | USERS |
> +-----------------------------+-------------+---------------
> ------------+--------------+--------------------------+
> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) |
> geogroup(OK) | geouser(urd-gds-volume) |
> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) |
> geogroup(OK) | geouser(urd-gds-volume) |
> | localhost | UP | /var/mountbroker-root(OK) |
> geogroup(OK) | geouser(urd-gds-volume) |
> +-----------------------------+-------------+---------------
> ------------+--------------+--------------------------+
>
>
> and that is all nodes on slave cluster, so mountbroker seems ok.
>
>
> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
>
> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
>
> Another error is that SSH between master and slave is broken,
>
> but now when I have changed permission on /var/log/glusterfs/cli.log I can
> run:
>
> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001
> gluster --xml --remote-host=localhost volume info urd-gds-volume
>
> as geouser and that works, which means that the ssh connection works.
>
>
> Is the permissions on /var/log/glusterfs/cli.log changed when
> geo-replication is setup?
>
> Is gluster supposed to be in /usr/local/sbin/gluster?
>
>
> Do I have any options or should I remove current geo-replication and
> create a new?
>
> How much do I need to clean up before creating a new geo-replication?
>
> In that case can I pause geo-replication, mount slave cluster on master
> cluster and run rsync , just to speed up transfer of files?
>
>
> Many thanks in advance!
>
> Marcus PedersÃ©n
>
>
> Part from the gsyncd.log:
>
> [2018-07-16 19:34:56.26287] E [syncdutils(worker
> /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh
> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replicatio\
> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/
> bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001
> /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-
> gds-volu\
> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd
> --master-brick /urd-gds/gluster --local-node urd-gds-geo-000
> --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
> --slave-gluster-command-dir /usr/local/sbin/ error=1
> [2018-07-16 19:34:56.26583] E [syncdutils(worker
> /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of
> "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:06.100481] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-16 19:35:06.762320] E [syncdutils(worker
> /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is
> broken
> [2018-07-16 19:35:06.763103] E [syncdutils(worker
> /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh
> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replicatio\
> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/
> bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001
> /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-
> gds-volu\
> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd
> --master-brick /urd-gds/gluster --local-node urd-gds-geo-000
> --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
> --slave-gluster-command-dir /usr/local/sbin/ error=1
> [2018-07-16 19:35:06.763398] E [syncdutils(worker
> /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of
> "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main]
> <top>: Using session config file path=/var/lib/glusterd/geo-
> replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-16 19:35:16.828912] I [changelogagent(agent
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-16 19:35:17.260257] E [syncdutils(worker
> /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is
> broken
>
> ------------------------------
> *FrÃ¥n:* gluster-users-***@gluster.org <gluster-users-bounces@
> gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> *Skickat:* den 13 juli 2018 14:50
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-***@gluster.org
> *Ãmne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Hi Kotresh,
> Yes, all nodes have the same version 4.1.1 both master and slave.
> All glusterd are crashing on the master side.
> Will send logs tonight.
>
> Thanks,
> Marcus
>
> ################
> Marcus PedersÃ©n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <
> ***@redhat.com>:
>
> Hi Marcus,
>
> Is the gluster geo-rep version is same on both master and slave?
>
> Thanks,
> Kotresh HR
>
> On Fri, Jul 13, 2018 at 1:26 AM, Marcus PedersÃ©n <***@slu.se>
> wrote:
>
> Hi Kotresh,
>
> i have replaced both files (gsyncdconfig.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
> and repce.py
> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
> in all nodes both master and slave.
>
> I rebooted all servers but geo-replication status is still Stopped.
>
> I tried to start geo-replication with response Successful but status still
> show Stopped on all nodes.
>
> Nothing has been written to geo-replication logs since I sent the tail of
> the log.
>
> So I do not know what info to provide?
>
>
> Please, help me to find a way to solve this.
>
>
> Thanks!
>
>
> Regards
>
> Marcus
>
>
> ------------------------------
> *FrÃ¥n:* gluster-users-***@gluster.org <gluster-users-***@gluster
> .org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> *Skickat:* den 12 juli 2018 08:51
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-***@gluster.org
> *Ãmne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Thanks Kotresh,
> I installed through the official centos channel, centos-release-gluster41.
> Isn't this fix included in centos install?
> I will have a look, test it tonight and come back to you!
>
> Thanks a lot!
>
> Regards
> Marcus
>
> ################
> Marcus PedersÃ©n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <
> ***@redhat.com>:
>
> Hi Marcus,
>
> I think the fix [1] is needed in 4.1
> Could you please this out and let us know if that works for you?
>
> [1] https://review.gluster.org/#/c/20207/
>
> Thanks,
> Kotresh HR
>
> On Thu, Jul 12, 2018 at 1:49 AM, Marcus PedersÃ©n <***@slu.se>
> wrote:
>
> Hi all,
>
> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade
> instructions for offline upgrade.
>
> I upgraded geo-replication side first 1 x (2+1) and the master side after
> that 2 x (2+1).
>
> Both clusters works the way they should on their own.
>
> After upgrade on master side status for all geo-replication nodes
> is Stopped.
>
> I tried to start the geo-replication from master node and response back
> was started successfully.
>
> Status again .... Stopped
>
> Tried to start again and get response started successfully, after that all
> glusterd crashed on all master nodes.
>
> After a restart of all glusterd the master cluster was up again.
>
> Status for geo-replication is still Stopped and every try to start it
> after this gives the response successful but still status Stopped.
>
>
> Please help me get the geo-replication up and running again.
>
>
> Best regards
>
> Marcus PedersÃ©n
>
>
> Part of geo-replication log from master node:
>
> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> <top>: connection to peer is broken
> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5
> 534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume error=2
> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> {monitor-status,monitor,worker
> ,agent,slave,status,config-check,config-get,config-set,confi
> g-reset,voluuidget,d\
> elete}
> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> ...
> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
> 'monit\
> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
> 'config-set', 'config-reset', 'voluuidget', 'delete')
> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster
> slave_node=ssh://***@urd-gds-geo-000:gluster://localhost
> :urd-gds-volume
> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote]
> SSH: Initializing SSH connection between master and slave...
> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__]
> ChangelogAgent: Agent listining...
> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
> <top>: connection to peer is broken
> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog]
> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5
> 534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
> --local-id .%\
> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
> gluster://localhost:urd-gds-volume error=2
> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> usage: gsyncd.py [-h]
> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh>
> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> {monitor-status,monitor,worker
> ,agent,slave,status,config-check,config-get,config-set,confi
> g-reset,voluuidget,d\
> elete}
> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> ...
> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr]
> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
> 'monit\
> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
> 'config-set', 'config-reset', 'voluuidget', 'delete')
> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize]
> <top>: exiting.
> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor:
> starting gsyncd worker brick=/urd-gds/gluster
> slave_node=ssh://***@urd-gds-geo-000:gluster://localhost
> :urd-gds-volume
> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor:
> Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor:
> worker died before establishing connection brick=/urd-gds/gluster
> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status]
> GeorepStatus: Worker Status Change status=inconsistent
> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception]
> <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 361, in twrap
> except:
> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428,
> in wmon
> sys.exit()
> TypeError: 'int' object is not iterable
> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>:
> exiting.
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>

--
Thanks and Regards,
Kotresh H R

Kotresh Hiremath Ravishankar

2018-07-18 04:05:22 UTC

Permalink

Hi Marcus,

Well there is nothing wrong in setting up a symlink for gluster binary
location, but
there is a geo-rep command to set it so that gsyncd will search there.

To set on master
#gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir
<gluster-binary-location>

To set on slave
#gluster vol geo-rep <mastervol> <slave-vol> config
slave-gluster-command-dir <gluster-binary-location>

Thanks,
Kotresh HR

On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <
***@redhat.com> wrote:

> Hi Marcus,
>
> I am testing out 4.1 myself and I will have some update today.
> For this particular traceback, gsyncd is not able to find the library.
> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> Please run the cmd below.
>
> #ldconfig /usr/lib
> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
>
> Geo-rep should be fixed automatically.
>
> Thanks,
> Kotresh HR
>
> On Wed, Jul 18, 2018 at 1:27 AM, Marcus PedersÃ©n <***@slu.se>
> wrote:
>
>> Hi again,
>>
>> I continue to do some testing, but now I have come to a stage where I
>> need help.
>>
>>
>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing
>> so I made a link.
>>
>> After that /usr/local/sbin/glusterfs was missing so I made a link there
>> as well.
>>
>> Both links were done on all slave nodes.
>>
>>
>> Now I have a new error that I can not resolve myself.
>>
>> It can not open libgfchangelog.so
>>
>>
>> Many thanks!
>>
>> Regards
>>
>> Marcus PedersÃ©n
>>
>>
>> Part of gsyncd.log:
>>
>> OSError: libgfchangelog.so: cannot open shared object file: No such file
>> or directory
>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor:
>> worker died in startup phase brick=/urd-gds/gluster
>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-17 19:32:17.542363] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-07-17 19:32:17.550894] I [resource(worker
>> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-07-17 19:32:19.166246] I [resource(worker
>> /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between
>> master and slave established. duration=1.6151
>> [2018-07-17 19:32:19.166806] I [resource(worker
>> /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume
>> locally...
>> [2018-07-17 19:32:20.257344] I [resource(worker
>> /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume
>> duration=1.0901
>> [2018-07-17 19:32:20.257921] I [subcmds(worker
>> /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
>> Acknowledging back to monitor
>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker]
>> <top>: call failed:
>> Traceback (most recent call last):
>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in
>> worker
>> res = getattr(self.obj, rmeth)(*in_data[2:])
>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>> line 37, in init
>> return Changes.cl_init()
>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>> line 21, in __getattr__
>> from libgfchangelog import Changes as LChanges
>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 17, in <module>
>> class Changes(object):
>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 19, in Changes
>> use_errno=True)
>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
>> self._handle = _dlopen(self._name, mode)
>> OSError: libgfchangelog.so: cannot open shared object file: No such file
>> or directory
>> [2018-07-17 19:32:20.275093] E [repce(worker
>> /urd-gds/gluster):206:__call__] RepceClient: call failed
>> call=6078:139982918485824:1531855940.27 method=init error=OSError
>> [2018-07-17 19:32:20.275192] E [syncdutils(worker
>> /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
>> Traceback (most recent call last):
>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311,
>> in main
>> func(args)
>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72,
>> in subcmd_worker
>> local.service_loop(remote)
>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>> 1236, in service_loop
>> changelog_agent.init()
>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in
>> __call__
>> return self.ins(self.meth, *a)
>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in
>> __call__
>> raise res
>> OSError: libgfchangelog.so: cannot open shared object file: No such file
>> or directory
>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor:
>> worker died in startup phase brick=/urd-gds/gluster
>>
>>
>>
>> ------------------------------
>> *FrÃ¥n:* gluster-users-***@gluster.org <gluster-users-***@gluster
>> .org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
>> *Skickat:* den 16 juli 2018 21:59
>> *Till:* ***@redhat.com
>>
>> *Kopia:* gluster-***@gluster.org
>> *Ãmne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not
>> work
>>
>>
>> Hi Kotresh,
>>
>> I have been testing for a bit and as you can see from the logs I sent
>> before permission is denied for geouser on slave node on file:
>>
>> /var/log/glusterfs/cli.log
>>
>> I have turned selinux off and just for testing I changed permissions on
>> /var/log/glusterfs/cli.log so geouser can access it.
>>
>> Starting geo-replication after that gives response successful but all
>> nodes get status Faulty.
>>
>>
>> If I run: gluster-mountbroker status
>>
>> I get:
>>
>> +-----------------------------+-------------+---------------
>> ------------+--------------+--------------------------+
>> | NODE | NODE STATUS | MOUNT ROOT
>> | GROUP | USERS |
>> +-----------------------------+-------------+---------------
>> ------------+--------------+--------------------------+
>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK)
>> | geogroup(OK) | geouser(urd-gds-volume) |
>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) |
>> geogroup(OK) | geouser(urd-gds-volume) |
>> | localhost | UP | /var/mountbroker-root(OK) |
>> geogroup(OK) | geouser(urd-gds-volume) |
>> +-----------------------------+-------------+---------------
>> ------------+--------------+--------------------------+
>>
>>
>> and that is all nodes on slave cluster, so mountbroker seems ok.
>>
>>
>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
>>
>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
>>
>> Another error is that SSH between master and slave is broken,
>>
>> but now when I have changed permission on /var/log/glusterfs/cli.log I
>> can run:
>>
>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replication/secret.pem -p 22
>> ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume
>> info urd-gds-volume
>>
>> as geouser and that works, which means that the ssh connection works.
>>
>>
>> Is the permissions on /var/log/glusterfs/cli.log changed when
>> geo-replication is setup?
>>
>> Is gluster supposed to be in /usr/local/sbin/gluster?
>>
>>
>> Do I have any options or should I remove current geo-replication and
>> create a new?
>>
>> How much do I need to clean up before creating a new geo-replication?
>>
>> In that case can I pause geo-replication, mount slave cluster on master
>> cluster and run rsync , just to speed up transfer of files?
>>
>>
>> Many thanks in advance!
>>
>> Marcus PedersÃ©n
>>
>>
>> Part from the gsyncd.log:
>>
>> [2018-07-16 19:34:56.26287] E [syncdutils(worker
>> /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh
>> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replicatio\
>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf6
>> 0c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001
>> /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-g
>> ds-volu\
>> me --master-node urd-gds-001 --master-node-id
>> 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster
>> --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794
>> --slave-timeo\
>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
>> --slave-gluster-command-dir /usr/local/sbin/ error=1
>> [2018-07-16 19:34:56.26583] E [syncdutils(worker
>> /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of
>> "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor:
>> worker died before establishing connection brick=/urd-gds/gluster
>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:06.100481] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-07-16 19:35:06.108834] I [resource(worker
>> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-07-16 19:35:06.762320] E [syncdutils(worker
>> /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is
>> broken
>> [2018-07-16 19:35:06.763103] E [syncdutils(worker
>> /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh
>> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replicatio\
>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf6
>> 0c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001
>> /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-g
>> ds-volu\
>> me --master-node urd-gds-001 --master-node-id
>> 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster
>> --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794
>> --slave-timeo\
>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
>> --slave-gluster-command-dir /usr/local/sbin/ error=1
>> [2018-07-16 19:35:06.763398] E [syncdutils(worker
>> /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of
>> "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor:
>> worker died before establishing connection brick=/urd-gds/gluster
>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main]
>> <top>: Using session config file path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:16.828912] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-07-16 19:35:16.837100] I [resource(worker
>> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-07-16 19:35:17.260257] E [syncdutils(worker
>> /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is
>> broken
>>
>> ------------------------------
>> *FrÃ¥n:* gluster-users-***@gluster.org <gluster-users-***@gluster
>> .org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
>> *Skickat:* den 13 juli 2018 14:50
>> *Till:* Kotresh Hiremath Ravishankar
>> *Kopia:* gluster-***@gluster.org
>> *Ãmne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not
>> work
>>
>> Hi Kotresh,
>> Yes, all nodes have the same version 4.1.1 both master and slave.
>> All glusterd are crashing on the master side.
>> Will send logs tonight.
>>
>> Thanks,
>> Marcus
>>
>> ################
>> Marcus PedersÃ©n
>> Systemadministrator
>> Interbull Centre
>> ################
>> Sent from my phone
>> ################
>>
>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <
>> ***@redhat.com>:
>>
>> Hi Marcus,
>>
>> Is the gluster geo-rep version is same on both master and slave?
>>
>> Thanks,
>> Kotresh HR
>>
>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus PedersÃ©n <***@slu.se>
>> wrote:
>>
>> Hi Kotresh,
>>
>> i have replaced both files (gsyncdconfig.py
>> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
>> and repce.py
>> <https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
>> in all nodes both master and slave.
>>
>> I rebooted all servers but geo-replication status is still Stopped.
>>
>> I tried to start geo-replication with response Successful but status
>> still show Stopped on all nodes.
>>
>> Nothing has been written to geo-replication logs since I sent the tail of
>> the log.
>>
>> So I do not know what info to provide?
>>
>>
>> Please, help me to find a way to solve this.
>>
>>
>> Thanks!
>>
>>
>> Regards
>>
>> Marcus
>>
>>
>> ------------------------------
>> *FrÃ¥n:* gluster-users-***@gluster.org <gluster-users-***@gluster
>> .org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
>> *Skickat:* den 12 juli 2018 08:51
>> *Till:* Kotresh Hiremath Ravishankar
>> *Kopia:* gluster-***@gluster.org
>> *Ãmne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not
>> work
>>
>> Thanks Kotresh,
>> I installed through the official centos channel, centos-release-gluster41.
>> Isn't this fix included in centos install?
>> I will have a look, test it tonight and come back to you!
>>
>> Thanks a lot!
>>
>> Regards
>> Marcus
>>
>> ################
>> Marcus PedersÃ©n
>> Systemadministrator
>> Interbull Centre
>> ################
>> Sent from my phone
>> ################
>>
>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <
>> ***@redhat.com>:
>>
>> Hi Marcus,
>>
>> I think the fix [1] is needed in 4.1
>> Could you please this out and let us know if that works for you?
>>
>> [1] https://review.gluster.org/#/c/20207/
>>
>> Thanks,
>> Kotresh HR
>>
>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus PedersÃ©n <***@slu.se>
>> wrote:
>>
>> Hi all,
>>
>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade
>> instructions for offline upgrade.
>>
>> I upgraded geo-replication side first 1 x (2+1) and the master side after
>> that 2 x (2+1).
>>
>> Both clusters works the way they should on their own.
>>
>> After upgrade on master side status for all geo-replication nodes
>> is Stopped.
>>
>> I tried to start the geo-replication from master node and response back
>> was started successfully.
>>
>> Status again .... Stopped
>>
>> Tried to start again and get response started successfully, after that
>> all glusterd crashed on all master nodes.
>>
>> After a restart of all glusterd the master cluster was up again.
>>
>> Status for geo-replication is still Stopped and every try to start it
>> after this gives the response successful but still status Stopped.
>>
>>
>> Please help me get the geo-replication up and running again.
>>
>>
>> Best regards
>>
>> Marcus PedersÃ©n
>>
>>
>> Part of geo-replication log from master node:
>>
>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__]
>> ChangelogAgent: Agent listining...
>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote]
>> SSH: Initializing SSH connection between master and slave...
>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
>> <top>: connection to peer is broken
>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog]
>> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5
>> 534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
>> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
>> --local-id .%\
>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
>> gluster://localhost:urd-gds-volume error=2
>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> usage: gsyncd.py [-h]
>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>
>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> {monitor-status,monitor,worker
>> ,agent,slave,status,config-check,config-get,config-set,confi
>> g-reset,voluuidget,d\
>> elete}
>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> ...
>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
>> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
>> 'monit\
>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
>> 'config-set', 'config-reset', 'voluuidget', 'delete')
>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor:
>> worker died before establishing connection brick=/urd-gds/gluster
>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor:
>> starting gsyncd worker brick=/urd-gds/gluster
>> slave_node=ssh://***@urd-gds-geo-000:gluster://localhost
>> :urd-gds-volume
>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote]
>> SSH: Initializing SSH connection between master and slave...
>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__]
>> ChangelogAgent: Agent listining...
>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception]
>> <top>: connection to peer is broken
>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog]
>> Popen: command returned error cmd=ssh -oPasswordAuthentication=no
>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5
>> 534547f3675a710a107722317484f.sock ***@urd-gds-geo-000
>> /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee
>> --local-id .%\
>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
>> gluster://localhost:urd-gds-volume error=2
>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> usage: gsyncd.py [-h]
>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>
>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> {monitor-status,monitor,worker
>> ,agent,slave,status,config-check,config-get,config-set,confi
>> g-reset,voluuidget,d\
>> elete}
>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> ...
>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
>> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status',
>> 'monit\
>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get',
>> 'config-set', 'config-reset', 'voluuidget', 'delete')
>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor:
>> worker died before establishing connection brick=/urd-gds/gluster
>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor:
>> starting gsyncd worker brick=/urd-gds/gluster
>> slave_node=ssh://***@urd-gds-geo-000:gluster://localhost
>> :urd-gds-volume
>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor:
>> Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor:
>> worker died before establishing connection brick=/urd-gds/gluster
>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status]
>> GeorepStatus: Worker Status Change status=inconsistent
>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception]
>> <top>: FAIL:
>> Traceback (most recent call last):
>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>> 361, in twrap
>> except:
>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428,
>> in wmon
>> sys.exit()
>> TypeError: 'int' object is not iterable
>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>:
>> exiting.
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-***@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>

--
Thanks and Regards,
Kotresh H R

Marcus Pedersén

2018-07-18 10:37:25 UTC

Permalink

Hi Kotresh,

I ran:

#ldconfig /usr/lib

on all nodes in both clusters but I still get the same error.

What to do?

Output for:

# ldconfig -p /usr/lib | grep libgf

libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0

I read somewere that you could change some settings for geo-replication to speed up sync.

I can not remember where I saw that and what config parameters.

When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,

and that will take a while before the slave nodes have catched up.

Thanks and regards

Marcus Pedersén

Part of gsyncd.log:

File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
[2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
[2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
return Changes.cl_init()
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
from libgfchangelog import Changes as LChanges
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
class Changes(object):
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
use_errno=True)
File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
[2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
changelog_agent.init()
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster

________________________________
Från: Kotresh Hiremath Ravishankar <***@redhat.com>
Skickat: den 18 juli 2018 06:05
Till: Marcus Pedersén
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Marcus,

Well there is nothing wrong in setting up a symlink for gluster binary location, but
there is a geo-rep command to set it so that gsyncd will search there.

To set on master
#gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>

To set on slave
#gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>

Thanks,
Kotresh HR

On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>> wrote:
Hi Marcus,

I am testing out 4.1 myself and I will have some update today.
For this particular traceback, gsyncd is not able to find the library.
Is it the rpm install? If so, gluster libraries would be in /usr/lib.
Please run the cmd below.

#ldconfig /usr/lib
#ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)

Geo-rep should be fixed automatically.

Thanks,
Kotresh HR

On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi again,

I continue to do some testing, but now I have come to a stage where I need help.

gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.

After that /usr/local/sbin/glusterfs was missing so I made a link there as well.

Both links were done on all slave nodes.

Now I have a new error that I can not resolve myself.

It can not open libgfchangelog.so

Many thanks!

Regards

Marcus Pedersén

Part of gsyncd.log:

OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
[2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
[2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
return Changes.cl_init()
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
from libgfchangelog import Changes as LChanges
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
class Changes(object):
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
use_errno=True)
File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
[2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
changelog_agent.init()
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster

________________________________
Från: gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org> <gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org>> för Marcus Pedersén <***@slu.se<mailto:***@slu.se>>
Skickat: den 16 juli 2018 21:59
Till: ***@redhat.com<mailto:***@redhat.com>

Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Kotresh,

I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:

/var/log/glusterfs/cli.log

I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.

Starting geo-replication after that gives response successful but all nodes get status Faulty.

If I run: gluster-mountbroker status

I get:

+-----------------------------+-------------+---------------------------+--------------+--------------------------+
| NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
+-----------------------------+-------------+---------------------------+--------------+--------------------------+
| urd-gds-geo-001.hgen.slu.se<http://urd-gds-geo-001.hgen.slu.se> | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
| urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
| localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
+-----------------------------+-------------+---------------------------+--------------+--------------------------+

and that is all nodes on slave cluster, so mountbroker seems ok.

gsyncd.log logs an error about /usr/local/sbin/gluster is missing.

That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster

Another error is that SSH between master and slave is broken,

but now when I have changed permission on /var/log/glusterfs/cli.log I can run:

ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume

as geouser and that works, which means that the ssh connection works.

Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?

Is gluster supposed to be in /usr/local/sbin/gluster?

Do I have any options or should I remove current geo-replication and create a new?

How much do I need to clean up before creating a new geo-replication?

In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?

Many thanks in advance!

Marcus Pedersén

Part from the gsyncd.log:

[2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
[2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
[2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
[2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken

________________________________
Från: gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org> <gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org>> för Marcus Pedersén <***@slu.se<mailto:***@slu.se>>
Skickat: den 13 juli 2018 14:50
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Kotresh,
Yes, all nodes have the same version 4.1.1 both master and slave.
All glusterd are crashing on the master side.
Will send logs tonight.

Thanks,
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

Is the gluster geo-rep version is same on both master and slave?

Thanks,
Kotresh HR

On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi Kotresh,

i have replaced both files (gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py> and repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>) in all nodes both master and slave.

I rebooted all servers but geo-replication status is still Stopped.

I tried to start geo-replication with response Successful but status still show Stopped on all nodes.

Nothing has been written to geo-replication logs since I sent the tail of the log.

So I do not know what info to provide?

Please, help me to find a way to solve this.

Thanks!

Regards

Marcus

________________________________
Från: gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org> <gluster-users-***@gluster.org<mailto:gluster-users-***@gluster.org>> för Marcus Pedersén <***@slu.se<mailto:***@slu.se>>
Skickat: den 12 juli 2018 08:51
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org<mailto:gluster-***@gluster.org>
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was started successfully.

Status again .... Stopped

Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.

Please help me get the geo-replication up and running again.

Best regards

Marcus Pedersén

Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
[2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
[2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
[2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
[2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
except:
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
sys.exit()
TypeError: 'int' object is not iterable
[2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Sunny Kumar

2018-07-23 09:16:59 UTC

Permalink

Hi Marcus,

On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén <***@slu.se> wrote:
>
> Hi Kotresh,
>
> I ran:
>
> #ldconfig /usr/lib
can you do -
ldconfig /usr/local/lib
>
> on all nodes in both clusters but I still get the same error.
>
> What to do?
>
>
> Output for:
>
> # ldconfig -p /usr/lib | grep libgf
>
> libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
>
>
> I read somewere that you could change some settings for geo-replication to speed up sync.
>
> I can not remember where I saw that and what config parameters.
>
> When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
>
> and that will take a while before the slave nodes have catched up.
>
>
> Thanks and regards
>
> Marcus Pedersén
>
>
> Part of gsyncd.log:
>
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> return Changes.cl_init()
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> from libgfchangelog import Changes as LChanges
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> class Changes(object):
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> use_errno=True)
> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> self._handle = _dlopen(self._name, mode)
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> func(args)
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> local.service_loop(remote)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> changelog_agent.init()
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
I think then you will not see this.
> [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
>
>
> ________________________________
> Från: Kotresh Hiremath Ravishankar <***@redhat.com>
> Skickat: den 18 juli 2018 06:05
> Till: Marcus Pedersén
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Hi Marcus,
>
> Well there is nothing wrong in setting up a symlink for gluster binary location, but
> there is a geo-rep command to set it so that gsyncd will search there.
>
> To set on master
> #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
>
> To set on slave
> #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
>
> Thanks,
> Kotresh HR
>
>
> On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
>>
>> Hi Marcus,
>>
>> I am testing out 4.1 myself and I will have some update today.
>> For this particular traceback, gsyncd is not able to find the library.
>> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
>> Please run the cmd below.
>>
>> #ldconfig /usr/lib
>> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
>>
>> Geo-rep should be fixed automatically.
>>
>> Thanks,
>> Kotresh HR
>>
>> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <***@slu.se> wrote:
>>>
>>> Hi again,
>>>
>>> I continue to do some testing, but now I have come to a stage where I need help.
>>>
>>>
>>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
>>>
>>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
>>>
>>> Both links were done on all slave nodes.
>>>
>>>
>>> Now I have a new error that I can not resolve myself.
>>>
>>> It can not open libgfchangelog.so
>>>
>>>
>>> Many thanks!
>>>
>>> Regards
>>>
>>> Marcus Pedersén
>>>
>>>
>>> Part of gsyncd.log:
>>>
>>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
>>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
>>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
>>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
>>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
>>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
>>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
>>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
>>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
>>> Traceback (most recent call last):
>>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
>>> res = getattr(self.obj, rmeth)(*in_data[2:])
>>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
>>> return Changes.cl_init()
>>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
>>> from libgfchangelog import Changes as LChanges
>>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
>>> class Changes(object):
>>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
>>> use_errno=True)
>>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
>>> self._handle = _dlopen(self._name, mode)
>>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
>>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
>>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
>>> Traceback (most recent call last):
>>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
>>> func(args)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
>>> local.service_loop(remote)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
>>> changelog_agent.init()
>>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
>>> return self.ins(self.meth, *a)
>>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
>>> raise res
>>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
>>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
>>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
>>>
>>>
>>>
>>> ________________________________
>>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
>>> Skickat: den 16 juli 2018 21:59
>>> Till: ***@redhat.com
>>>
>>> Kopia: gluster-***@gluster.org
>>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>>>
>>>
>>> Hi Kotresh,
>>>
>>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
>>>
>>> /var/log/glusterfs/cli.log
>>>
>>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
>>>
>>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
>>>
>>>
>>> If I run: gluster-mountbroker status
>>>
>>> I get:
>>>
>>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
>>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
>>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
>>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
>>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
>>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
>>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
>>>
>>>
>>> and that is all nodes on slave cluster, so mountbroker seems ok.
>>>
>>>
>>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
>>>
>>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
>>>
>>> Another error is that SSH between master and slave is broken,
>>>
>>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
>>>
>>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
>>>
>>> as geouser and that works, which means that the ssh connection works.
>>>
>>>
>>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
>>>
>>> Is gluster supposed to be in /usr/local/sbin/gluster?
>>>
>>>
>>> Do I have any options or should I remove current geo-replication and create a new?
>>>
>>> How much do I need to clean up before creating a new geo-replication?
>>>
>>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
>>>
>>>
>>> Many thanks in advance!
>>>
>>> Marcus Pedersén
>>>
>>>
>>> Part from the gsyncd.log:
>>>
>>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
>>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
>>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
>>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
>>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
>>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
>>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
>>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
>>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
>>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
>>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
>>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
>>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
>>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
>>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
>>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
>>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
>>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
>>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
>>>
>>> ________________________________
>>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
>>> Skickat: den 13 juli 2018 14:50
>>> Till: Kotresh Hiremath Ravishankar
>>> Kopia: gluster-***@gluster.org
>>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>>>
>>> Hi Kotresh,
>>> Yes, all nodes have the same version 4.1.1 both master and slave.
>>> All glusterd are crashing on the master side.
>>> Will send logs tonight.
>>>
>>> Thanks,
>>> Marcus
>>>
>>> ################
>>> Marcus Pedersén
>>> Systemadministrator
>>> Interbull Centre
>>> ################
>>> Sent from my phone
>>> ################
>>>
>>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>>>
>>> Hi Marcus,
>>>
>>> Is the gluster geo-rep version is same on both master and slave?
>>>
>>> Thanks,
>>> Kotresh HR
>>>
>>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se> wrote:
>>>
>>> Hi Kotresh,
>>>
>>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
>>>
>>> I rebooted all servers but geo-replication status is still Stopped.
>>>
>>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
>>>
>>> Nothing has been written to geo-replication logs since I sent the tail of the log.
>>>
>>> So I do not know what info to provide?
>>>
>>>
>>> Please, help me to find a way to solve this.
>>>
>>>
>>> Thanks!
>>>
>>>
>>> Regards
>>>
>>> Marcus
>>>
>>>
>>> ________________________________
>>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
>>> Skickat: den 12 juli 2018 08:51
>>> Till: Kotresh Hiremath Ravishankar
>>> Kopia: gluster-***@gluster.org
>>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>>>
>>> Thanks Kotresh,
>>> I installed through the official centos channel, centos-release-gluster41.
>>> Isn't this fix included in centos install?
>>> I will have a look, test it tonight and come back to you!
>>>
>>> Thanks a lot!
>>>
>>> Regards
>>> Marcus
>>>
>>> ################
>>> Marcus Pedersén
>>> Systemadministrator
>>> Interbull Centre
>>> ################
>>> Sent from my phone
>>> ################
>>>
>>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>>>
>>> Hi Marcus,
>>>
>>> I think the fix [1] is needed in 4.1
>>> Could you please this out and let us know if that works for you?
>>>
>>> [1] https://review.gluster.org/#/c/20207/
>>>
>>> Thanks,
>>> Kotresh HR
>>>
>>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se> wrote:
>>>
>>> Hi all,
>>>
>>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
>>>
>>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
>>>
>>> Both clusters works the way they should on their own.
>>>
>>> After upgrade on master side status for all geo-replication nodes is Stopped.
>>>
>>> I tried to start the geo-replication from master node and response back was started successfully.
>>>
>>> Status again .... Stopped
>>>
>>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
>>>
>>> After a restart of all glusterd the master cluster was up again.
>>>
>>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
>>>
>>>
>>> Please help me get the geo-replication up and running again.
>>>
>>>
>>> Best regards
>>>
>>> Marcus Pedersén
>>>
>>>
>>> Part of geo-replication log from master node:
>>>
>>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
>>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
>>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
>>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
>>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
>>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
>>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
>>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
>>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
>>> elete}
>>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
>>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
>>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
>>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
>>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
>>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
>>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
>>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
>>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
>>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
>>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
>>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
>>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
>>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
>>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
>>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
>>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
>>> elete}
>>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
>>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
>>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
>>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
>>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
>>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
>>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
>>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
>>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
>>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
>>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
>>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
>>> Traceback (most recent call last):
>>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
>>> except:
>>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
>>> sys.exit()
>>> TypeError: 'int' object is not iterable
>>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
>>>
>>> ---
>>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
>>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-***@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>>
>>> ---
>>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
>>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>>>
>>> ---
>>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
>>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>>>
>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>>
>>> ---
>>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
>>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>>>
>>> ---
>>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
>>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>>>
>>> ---
>>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
>>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Sunny Kumar

2018-07-23 10:53:14 UTC

Permalink

Hi Marcus,

On Mon, Jul 23, 2018 at 4:04 PM Marcus Pedersén <***@slu.se> wrote:
>
> Hi Sunny,
> ldconfig -p /usr/local/lib | grep libgf
> Output:
> libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
>
> So that seems to be alright, right?
>
Yes, this seems wright can you share the gsyncd.log again
> Best regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 23 juli 2018 11:17 skrev Sunny Kumar <***@redhat.com>:
>
> Hi Marcus,
>
> On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén <***@slu.se> wrote:
> >
> > Hi Kotresh,
> >
> > I ran:
> >
> > #ldconfig /usr/lib
> can you do -
> ldconfig /usr/local/lib
>
>
> Output:
>
> >
> > on all nodes in both clusters but I still get the same error.
> >
> > What to do?
> >
> >
> > Output for:
> >
> > # ldconfig -p /usr/lib | grep libgf
> >
> > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> >
> >
> > I read somewere that you could change some settings for geo-replication to speed up sync.
> >
> > I can not remember where I saw that and what config parameters.
> >
> > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
> >
> > and that will take a while before the slave nodes have catched up.
> >
> >
> > Thanks and regards
> >
> > Marcus Pedersén
> >
> >
> > Part of gsyncd.log:
> >
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > res = getattr(self.obj, rmeth)(*in_data[2:])
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > return Changes.cl_init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > from libgfchangelog import Changes as LChanges
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > class Changes(object):
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > use_errno=True)
> > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > self._handle = _dlopen(self._name, mode)
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > func(args)
> > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > local.service_loop(remote)
> > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > changelog_agent.init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > return self.ins(self.meth, *a)
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> I think then you will not see this.
> > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >
> >
> > ________________________________
> > Från: Kotresh Hiremath Ravishankar <***@redhat.com>
> > Skickat: den 18 juli 2018 06:05
> > Till: Marcus Pedersén
> > Kopia: gluster-***@gluster.org
> > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >
> > Hi Marcus,
> >
> > Well there is nothing wrong in setting up a symlink for gluster binary location, but
> > there is a geo-rep command to set it so that gsyncd will search there.
> >
> > To set on master
> > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
> >
> > To set on slave
> > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
> >
> > Thanks,
> > Kotresh HR
> >
> >
> > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
> >>
> >> Hi Marcus,
> >>
> >> I am testing out 4.1 myself and I will have some update today.
> >> For this particular traceback, gsyncd is not able to find the library.
> >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> >> Please run the cmd below.
> >>
> >> #ldconfig /usr/lib
> >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
> >>
> >> Geo-rep should be fixed automatically.
> >>
> >> Thanks,
> >> Kotresh HR
> >>
> >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <***@slu.se> wrote:
> >>>
> >>> Hi again,
> >>>
> >>> I continue to do some testing, but now I have come to a stage where I need help.
> >>>
> >>>
> >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
> >>>
> >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
> >>>
> >>> Both links were done on all slave nodes.
> >>>
> >>>
> >>> Now I have a new error that I can not resolve myself.
> >>>
> >>> It can not open libgfchangelog.so
> >>>
> >>>
> >>> Many thanks!
> >>>
> >>> Regards
> >>>
> >>> Marcus Pedersén
> >>>
> >>>
> >>> Part of gsyncd.log:
> >>>
> >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
> >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
> >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> >>> Traceback (most recent call last):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> >>> res = getattr(self.obj, rmeth)(*in_data[2:])
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> >>> return Changes.cl_init()
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> >>> from libgfchangelog import Changes as LChanges
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> >>> class Changes(object):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> >>> use_errno=True)
> >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> >>> self._handle = _dlopen(self._name, mode)
> >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
> >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> >>> Traceback (most recent call last):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> >>> func(args)
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> >>> local.service_loop(remote)
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> >>> changelog_agent.init()
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> >>> return self.ins(self.meth, *a)
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> >>> raise res
> >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >>>
> >>>
> >>>
> >>> ________________________________
> >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> >>> Skickat: den 16 juli 2018 21:59
> >>> Till: ***@redhat.com
> >>>
> >>> Kopia: gluster-***@gluster.org
> >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >>>
> >>>
> >>> Hi Kotresh,
> >>>
> >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
> >>>
> >>> /var/log/glusterfs/cli.log
> >>>
> >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
> >>>
> >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
> >>>
> >>>
> >>> If I run: gluster-mountbroker status
> >>>
> >>> I get:
> >>>
> >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
> >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> >>>
> >>>
> >>> and that is all nodes on slave cluster, so mountbroker seems ok.
> >>>
> >>>
> >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
> >>>
> >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
> >>>
> >>> Another error is that SSH between master and slave is broken,
> >>>
> >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
> >>>
> >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
> >>>
> >>> as geouser and that works, which means that the ssh connection works.
> >>>
> >>>
> >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
> >>>
> >>> Is gluster supposed to be in /usr/local/sbin/gluster?
> >>>
> >>>
> >>> Do I have any options or should I remove current geo-replication and create a new?
> >>>
> >>> How much do I need to clean up before creating a new geo-replication?
> >>>
> >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
> >>>
> >>>
> >>> Many thanks in advance!
> >>>
> >>> Marcus Pedersén
> >>>
> >>>
> >>> Part from the gsyncd.log:
> >>>
> >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> >>>
> >>> ________________________________
> >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> >>> Skickat: den 13 juli 2018 14:50
> >>> Till: Kotresh Hiremath Ravishankar
> >>> Kopia: gluster-***@gluster.org
> >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >>>
> >>> Hi Kotresh,
> >>> Yes, all nodes have the same version 4.1.1 both master and slave.
> >>> All glusterd are crashing on the master side.
> >>> Will send logs tonight.
> >>>
> >>> Thanks,
> >>> Marcus
> >>>
> >>> ################
> >>> Marcus Pedersén
> >>> Systemadministrator
> >>> Interbull Centre
> >>> ################
> >>> Sent from my phone
> >>> ################
> >>>
> >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> >>>
> >>> Hi Marcus,
> >>>
> >>> Is the gluster geo-rep version is same on both master and slave?
> >>>
> >>> Thanks,
> >>> Kotresh HR
> >>>
> >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se> wrote:
> >>>
> >>> Hi Kotresh,
> >>>
> >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
> >>>
> >>> I rebooted all servers but geo-replication status is still Stopped.
> >>>
> >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
> >>>
> >>> Nothing has been written to geo-replication logs since I sent the tail of the log.
> >>>
> >>> So I do not know what info to provide?
> >>>
> >>>
> >>> Please, help me to find a way to solve this.
> >>>
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> Regards
> >>>
> >>> Marcus
> >>>
> >>>
> >>> ________________________________
> >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> >>> Skickat: den 12 juli 2018 08:51
> >>> Till: Kotresh Hiremath Ravishankar
> >>> Kopia: gluster-***@gluster.org
> >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >>>
> >>> Thanks Kotresh,
> >>> I installed through the official centos channel, centos-release-gluster41.
> >>> Isn't this fix included in centos install?
> >>> I will have a look, test it tonight and come back to you!
> >>>
> >>> Thanks a lot!
> >>>
> >>> Regards
> >>> Marcus
> >>>
> >>> ################
> >>> Marcus Pedersén
> >>> Systemadministrator
> >>> Interbull Centre
> >>> ################
> >>> Sent from my phone
> >>> ################
> >>>
> >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> >>>
> >>> Hi Marcus,
> >>>
> >>> I think the fix [1] is needed in 4.1
> >>> Could you please this out and let us know if that works for you?
> >>>
> >>> [1] https://review.gluster.org/#/c/20207/
> >>>
> >>> Thanks,
> >>> Kotresh HR
> >>>
> >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
> >>>
> >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
> >>>
> >>> Both clusters works the way they should on their own.
> >>>
> >>> After upgrade on master side status for all geo-replication nodes is Stopped.
> >>>
> >>> I tried to start the geo-replication from master node and response back was started successfully.
> >>>
> >>> Status again .... Stopped
> >>>
> >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
> >>>
> >>> After a restart of all glusterd the master cluster was up again.
> >>>
> >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
> >>>
> >>>
> >>> Please help me get the geo-replication up and running again.
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Marcus Pedersén
> >>>
> >>>
> >>> Part of geo-replication log from master node:
> >>>
> >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> >>> elete}
> >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> >>> elete}
> >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
> >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
> >>> Traceback (most recent call last):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
> >>> except:
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
> >>> sys.exit()
> >>> TypeError: 'int' object is not iterable
> >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>>
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-***@gluster.org
> >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks and Regards,
> >>> Kotresh H R
> >>>
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks and Regards,
> >>> Kotresh H R
> >>>
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>
> >>
> >>
> >>
> >> --
> >> Thanks and Regards,
> >> Kotresh H R
> >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Kotresh H R
> >
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-***@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

- Sunny

Marcus Pedersén

2018-07-23 11:42:20 UTC

Permalink

Hi Sunny,
Here comes a part of gsyncd.log (The same info is repeated over and over again):

File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-23 11:33:20.327152] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-23 11:33:20.335777] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-07-23 11:33:22.11188] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6752
[2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0894
[2018-07-23 11:33:23.102168] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
return Changes.cl_init()
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
from libgfchangelog import Changes as LChanges
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
class Changes(object):
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
use_errno=True)
File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=29589:140155686246208:1532345603.11 method=init error=OSError
[2018-07-23 11:33:23.119708] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
changelog_agent.init()
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
[2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
[2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster

Thanks, Sunny!!

Regards
Marcus Pedersén

________________________________________
Från: Sunny Kumar <***@redhat.com>
Skickat: den 23 juli 2018 12:53
Till: Marcus Pedersén
Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Marcus,

On Mon, Jul 23, 2018 at 4:04 PM Marcus Pedersén <***@slu.se> wrote:
>
> Hi Sunny,
> ldconfig -p /usr/local/lib | grep libgf
> Output:
> libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
>
> So that seems to be alright, right?
>
Yes, this seems wright can you share the gsyncd.log again
> Best regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 23 juli 2018 11:17 skrev Sunny Kumar <***@redhat.com>:
>
> Hi Marcus,
>
> On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén <***@slu.se> wrote:
> >
> > Hi Kotresh,
> >
> > I ran:
> >
> > #ldconfig /usr/lib
> can you do -
> ldconfig /usr/local/lib
>
>
> Output:
>
> >
> > on all nodes in both clusters but I still get the same error.
> >
> > What to do?
> >
> >
> > Output for:
> >
> > # ldconfig -p /usr/lib | grep libgf
> >
> > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> >
> >
> > I read somewere that you could change some settings for geo-replication to speed up sync.
> >
> > I can not remember where I saw that and what config parameters.
> >
> > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
> >
> > and that will take a while before the slave nodes have catched up.
> >
> >
> > Thanks and regards
> >
> > Marcus Pedersén
> >
> >
> > Part of gsyncd.log:
> >
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > res = getattr(self.obj, rmeth)(*in_data[2:])
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > return Changes.cl_init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > from libgfchangelog import Changes as LChanges
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > class Changes(object):
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > use_errno=True)
> > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > self._handle = _dlopen(self._name, mode)
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > func(args)
> > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > local.service_loop(remote)
> > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > changelog_agent.init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > return self.ins(self.meth, *a)
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> I think then you will not see this.
> > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >
> >
> > ________________________________
> > Från: Kotresh Hiremath Ravishankar <***@redhat.com>
> > Skickat: den 18 juli 2018 06:05
> > Till: Marcus Pedersén
> > Kopia: gluster-***@gluster.org
> > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >
> > Hi Marcus,
> >
> > Well there is nothing wrong in setting up a symlink for gluster binary location, but
> > there is a geo-rep command to set it so that gsyncd will search there.
> >
> > To set on master
> > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
> >
> > To set on slave
> > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
> >
> > Thanks,
> > Kotresh HR
> >
> >
> > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
> >>
> >> Hi Marcus,
> >>
> >> I am testing out 4.1 myself and I will have some update today.
> >> For this particular traceback, gsyncd is not able to find the library.
> >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> >> Please run the cmd below.
> >>
> >> #ldconfig /usr/lib
> >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
> >>
> >> Geo-rep should be fixed automatically.
> >>
> >> Thanks,
> >> Kotresh HR
> >>
> >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <***@slu.se> wrote:
> >>>
> >>> Hi again,
> >>>
> >>> I continue to do some testing, but now I have come to a stage where I need help.
> >>>
> >>>
> >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
> >>>
> >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
> >>>
> >>> Both links were done on all slave nodes.
> >>>
> >>>
> >>> Now I have a new error that I can not resolve myself.
> >>>
> >>> It can not open libgfchangelog.so
> >>>
> >>>
> >>> Many thanks!
> >>>
> >>> Regards
> >>>
> >>> Marcus Pedersén
> >>>
> >>>
> >>> Part of gsyncd.log:
> >>>
> >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
> >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
> >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> >>> Traceback (most recent call last):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> >>> res = getattr(self.obj, rmeth)(*in_data[2:])
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> >>> return Changes.cl_init()
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> >>> from libgfchangelog import Changes as LChanges
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> >>> class Changes(object):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> >>> use_errno=True)
> >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> >>> self._handle = _dlopen(self._name, mode)
> >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
> >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> >>> Traceback (most recent call last):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> >>> func(args)
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> >>> local.service_loop(remote)
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> >>> changelog_agent.init()
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> >>> return self.ins(self.meth, *a)
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> >>> raise res
> >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >>>
> >>>
> >>>
> >>> ________________________________
> >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> >>> Skickat: den 16 juli 2018 21:59
> >>> Till: ***@redhat.com
> >>>
> >>> Kopia: gluster-***@gluster.org
> >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >>>
> >>>
> >>> Hi Kotresh,
> >>>
> >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
> >>>
> >>> /var/log/glusterfs/cli.log
> >>>
> >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
> >>>
> >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
> >>>
> >>>
> >>> If I run: gluster-mountbroker status
> >>>
> >>> I get:
> >>>
> >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
> >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> >>>
> >>>
> >>> and that is all nodes on slave cluster, so mountbroker seems ok.
> >>>
> >>>
> >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
> >>>
> >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
> >>>
> >>> Another error is that SSH between master and slave is broken,
> >>>
> >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
> >>>
> >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
> >>>
> >>> as geouser and that works, which means that the ssh connection works.
> >>>
> >>>
> >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
> >>>
> >>> Is gluster supposed to be in /usr/local/sbin/gluster?
> >>>
> >>>
> >>> Do I have any options or should I remove current geo-replication and create a new?
> >>>
> >>> How much do I need to clean up before creating a new geo-replication?
> >>>
> >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
> >>>
> >>>
> >>> Many thanks in advance!
> >>>
> >>> Marcus Pedersén
> >>>
> >>>
> >>> Part from the gsyncd.log:
> >>>
> >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> >>>
> >>> ________________________________
> >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> >>> Skickat: den 13 juli 2018 14:50
> >>> Till: Kotresh Hiremath Ravishankar
> >>> Kopia: gluster-***@gluster.org
> >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >>>
> >>> Hi Kotresh,
> >>> Yes, all nodes have the same version 4.1.1 both master and slave.
> >>> All glusterd are crashing on the master side.
> >>> Will send logs tonight.
> >>>
> >>> Thanks,
> >>> Marcus
> >>>
> >>> ################
> >>> Marcus Pedersén
> >>> Systemadministrator
> >>> Interbull Centre
> >>> ################
> >>> Sent from my phone
> >>> ################
> >>>
> >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> >>>
> >>> Hi Marcus,
> >>>
> >>> Is the gluster geo-rep version is same on both master and slave?
> >>>
> >>> Thanks,
> >>> Kotresh HR
> >>>
> >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se> wrote:
> >>>
> >>> Hi Kotresh,
> >>>
> >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
> >>>
> >>> I rebooted all servers but geo-replication status is still Stopped.
> >>>
> >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
> >>>
> >>> Nothing has been written to geo-replication logs since I sent the tail of the log.
> >>>
> >>> So I do not know what info to provide?
> >>>
> >>>
> >>> Please, help me to find a way to solve this.
> >>>
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> Regards
> >>>
> >>> Marcus
> >>>
> >>>
> >>> ________________________________
> >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> >>> Skickat: den 12 juli 2018 08:51
> >>> Till: Kotresh Hiremath Ravishankar
> >>> Kopia: gluster-***@gluster.org
> >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >>>
> >>> Thanks Kotresh,
> >>> I installed through the official centos channel, centos-release-gluster41.
> >>> Isn't this fix included in centos install?
> >>> I will have a look, test it tonight and come back to you!
> >>>
> >>> Thanks a lot!
> >>>
> >>> Regards
> >>> Marcus
> >>>
> >>> ################
> >>> Marcus Pedersén
> >>> Systemadministrator
> >>> Interbull Centre
> >>> ################
> >>> Sent from my phone
> >>> ################
> >>>
> >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> >>>
> >>> Hi Marcus,
> >>>
> >>> I think the fix [1] is needed in 4.1
> >>> Could you please this out and let us know if that works for you?
> >>>
> >>> [1] https://review.gluster.org/#/c/20207/
> >>>
> >>> Thanks,
> >>> Kotresh HR
> >>>
> >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
> >>>
> >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
> >>>
> >>> Both clusters works the way they should on their own.
> >>>
> >>> After upgrade on master side status for all geo-replication nodes is Stopped.
> >>>
> >>> I tried to start the geo-replication from master node and response back was started successfully.
> >>>
> >>> Status again .... Stopped
> >>>
> >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
> >>>
> >>> After a restart of all glusterd the master cluster was up again.
> >>>
> >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
> >>>
> >>>
> >>> Please help me get the geo-replication up and running again.
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Marcus Pedersén
> >>>
> >>>
> >>> Part of geo-replication log from master node:
> >>>
> >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> >>> elete}
> >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> >>> elete}
> >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
> >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
> >>> Traceback (most recent call last):
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
> >>> except:
> >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
> >>> sys.exit()
> >>> TypeError: 'int' object is not iterable
> >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>>
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-***@gluster.org
> >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks and Regards,
> >>> Kotresh H R
> >>>
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks and Regards,
> >>> Kotresh H R
> >>>
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>>
> >>> ---
> >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >>
> >>
> >>
> >>
> >> --
> >> Thanks and Regards,
> >> Kotresh H R
> >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Kotresh H R
> >
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-***@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

- Sunny
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Sunny Kumar

2018-07-23 12:17:04 UTC

Permalink

Hi,

Can you confirm the location for libgfchangelog.so
by sharing output of following command -
# find /usr/ -name libglusterfs.so

- Sunny

On Mon, Jul 23, 2018 at 5:12 PM Marcus Pedersén <***@slu.se> wrote:
>
> Hi Sunny,
> Here comes a part of gsyncd.log (The same info is repeated over and over again):
>
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-23 11:33:20.327152] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-23 11:33:20.335777] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-07-23 11:33:22.11188] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6752
> [2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0894
> [2018-07-23 11:33:23.102168] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> return Changes.cl_init()
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> from libgfchangelog import Changes as LChanges
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> class Changes(object):
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> use_errno=True)
> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> self._handle = _dlopen(self._name, mode)
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=29589:140155686246208:1532345603.11 method=init error=OSError
> [2018-07-23 11:33:23.119708] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> func(args)
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> local.service_loop(remote)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> changelog_agent.init()
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
>
> Thanks, Sunny!!
>
> Regards
> Marcus Pedersén
>
> ________________________________________
> Från: Sunny Kumar <***@redhat.com>
> Skickat: den 23 juli 2018 12:53
> Till: Marcus Pedersén
> Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Hi Marcus,
>
> On Mon, Jul 23, 2018 at 4:04 PM Marcus Pedersén <***@slu.se> wrote:
> >
> > Hi Sunny,
> > ldconfig -p /usr/local/lib | grep libgf
> > Output:
> > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> >
> > So that seems to be alright, right?
> >
> Yes, this seems wright can you share the gsyncd.log again
> > Best regards
> > Marcus
> >
> > ################
> > Marcus Pedersén
> > Systemadministrator
> > Interbull Centre
> > ################
> > Sent from my phone
> > ################
> >
> > Den 23 juli 2018 11:17 skrev Sunny Kumar <***@redhat.com>:
> >
> > Hi Marcus,
> >
> > On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén <***@slu.se> wrote:
> > >
> > > Hi Kotresh,
> > >
> > > I ran:
> > >
> > > #ldconfig /usr/lib
> > can you do -
> > ldconfig /usr/local/lib
> >
> >
> > Output:
> >
> > >
> > > on all nodes in both clusters but I still get the same error.
> > >
> > > What to do?
> > >
> > >
> > > Output for:
> > >
> > > # ldconfig -p /usr/lib | grep libgf
> > >
> > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> > > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> > > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> > > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > >
> > >
> > > I read somewere that you could change some settings for geo-replication to speed up sync.
> > >
> > > I can not remember where I saw that and what config parameters.
> > >
> > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
> > >
> > > and that will take a while before the slave nodes have catched up.
> > >
> > >
> > > Thanks and regards
> > >
> > > Marcus Pedersén
> > >
> > >
> > > Part of gsyncd.log:
> > >
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > raise res
> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> > > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> > > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > Traceback (most recent call last):
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > res = getattr(self.obj, rmeth)(*in_data[2:])
> > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > return Changes.cl_init()
> > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > from libgfchangelog import Changes as LChanges
> > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > class Changes(object):
> > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > use_errno=True)
> > > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > self._handle = _dlopen(self._name, mode)
> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> > > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > Traceback (most recent call last):
> > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > func(args)
> > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > local.service_loop(remote)
> > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > changelog_agent.init()
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > return self.ins(self.meth, *a)
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > raise res
> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > I think then you will not see this.
> > > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > >
> > >
> > > ________________________________
> > > Från: Kotresh Hiremath Ravishankar <***@redhat.com>
> > > Skickat: den 18 juli 2018 06:05
> > > Till: Marcus Pedersén
> > > Kopia: gluster-***@gluster.org
> > > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >
> > > Hi Marcus,
> > >
> > > Well there is nothing wrong in setting up a symlink for gluster binary location, but
> > > there is a geo-rep command to set it so that gsyncd will search there.
> > >
> > > To set on master
> > > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
> > >
> > > To set on slave
> > > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
> > >
> > > Thanks,
> > > Kotresh HR
> > >
> > >
> > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
> > >>
> > >> Hi Marcus,
> > >>
> > >> I am testing out 4.1 myself and I will have some update today.
> > >> For this particular traceback, gsyncd is not able to find the library.
> > >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> > >> Please run the cmd below.
> > >>
> > >> #ldconfig /usr/lib
> > >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
> > >>
> > >> Geo-rep should be fixed automatically.
> > >>
> > >> Thanks,
> > >> Kotresh HR
> > >>
> > >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <***@slu.se> wrote:
> > >>>
> > >>> Hi again,
> > >>>
> > >>> I continue to do some testing, but now I have come to a stage where I need help.
> > >>>
> > >>>
> > >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
> > >>>
> > >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
> > >>>
> > >>> Both links were done on all slave nodes.
> > >>>
> > >>>
> > >>> Now I have a new error that I can not resolve myself.
> > >>>
> > >>> It can not open libgfchangelog.so
> > >>>
> > >>>
> > >>> Many thanks!
> > >>>
> > >>> Regards
> > >>>
> > >>> Marcus Pedersén
> > >>>
> > >>>
> > >>> Part of gsyncd.log:
> > >>>
> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
> > >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
> > >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > >>> Traceback (most recent call last):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > >>> res = getattr(self.obj, rmeth)(*in_data[2:])
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > >>> return Changes.cl_init()
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > >>> from libgfchangelog import Changes as LChanges
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > >>> class Changes(object):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > >>> use_errno=True)
> > >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > >>> self._handle = _dlopen(self._name, mode)
> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
> > >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > >>> Traceback (most recent call last):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > >>> func(args)
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > >>> local.service_loop(remote)
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > >>> changelog_agent.init()
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > >>> return self.ins(self.meth, *a)
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > >>> raise res
> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > >>>
> > >>>
> > >>>
> > >>> ________________________________
> > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > >>> Skickat: den 16 juli 2018 21:59
> > >>> Till: ***@redhat.com
> > >>>
> > >>> Kopia: gluster-***@gluster.org
> > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >>>
> > >>>
> > >>> Hi Kotresh,
> > >>>
> > >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
> > >>>
> > >>> /var/log/glusterfs/cli.log
> > >>>
> > >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
> > >>>
> > >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
> > >>>
> > >>>
> > >>> If I run: gluster-mountbroker status
> > >>>
> > >>> I get:
> > >>>
> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > >>>
> > >>>
> > >>> and that is all nodes on slave cluster, so mountbroker seems ok.
> > >>>
> > >>>
> > >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
> > >>>
> > >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
> > >>>
> > >>> Another error is that SSH between master and slave is broken,
> > >>>
> > >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
> > >>>
> > >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
> > >>>
> > >>> as geouser and that works, which means that the ssh connection works.
> > >>>
> > >>>
> > >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
> > >>>
> > >>> Is gluster supposed to be in /usr/local/sbin/gluster?
> > >>>
> > >>>
> > >>> Do I have any options or should I remove current geo-replication and create a new?
> > >>>
> > >>> How much do I need to clean up before creating a new geo-replication?
> > >>>
> > >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
> > >>>
> > >>>
> > >>> Many thanks in advance!
> > >>>
> > >>> Marcus Pedersén
> > >>>
> > >>>
> > >>> Part from the gsyncd.log:
> > >>>
> > >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > >>>
> > >>> ________________________________
> > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > >>> Skickat: den 13 juli 2018 14:50
> > >>> Till: Kotresh Hiremath Ravishankar
> > >>> Kopia: gluster-***@gluster.org
> > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >>>
> > >>> Hi Kotresh,
> > >>> Yes, all nodes have the same version 4.1.1 both master and slave.
> > >>> All glusterd are crashing on the master side.
> > >>> Will send logs tonight.
> > >>>
> > >>> Thanks,
> > >>> Marcus
> > >>>
> > >>> ################
> > >>> Marcus Pedersén
> > >>> Systemadministrator
> > >>> Interbull Centre
> > >>> ################
> > >>> Sent from my phone
> > >>> ################
> > >>>
> > >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > >>>
> > >>> Hi Marcus,
> > >>>
> > >>> Is the gluster geo-rep version is same on both master and slave?
> > >>>
> > >>> Thanks,
> > >>> Kotresh HR
> > >>>
> > >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se> wrote:
> > >>>
> > >>> Hi Kotresh,
> > >>>
> > >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
> > >>>
> > >>> I rebooted all servers but geo-replication status is still Stopped.
> > >>>
> > >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
> > >>>
> > >>> Nothing has been written to geo-replication logs since I sent the tail of the log.
> > >>>
> > >>> So I do not know what info to provide?
> > >>>
> > >>>
> > >>> Please, help me to find a way to solve this.
> > >>>
> > >>>
> > >>> Thanks!
> > >>>
> > >>>
> > >>> Regards
> > >>>
> > >>> Marcus
> > >>>
> > >>>
> > >>> ________________________________
> > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > >>> Skickat: den 12 juli 2018 08:51
> > >>> Till: Kotresh Hiremath Ravishankar
> > >>> Kopia: gluster-***@gluster.org
> > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >>>
> > >>> Thanks Kotresh,
> > >>> I installed through the official centos channel, centos-release-gluster41.
> > >>> Isn't this fix included in centos install?
> > >>> I will have a look, test it tonight and come back to you!
> > >>>
> > >>> Thanks a lot!
> > >>>
> > >>> Regards
> > >>> Marcus
> > >>>
> > >>> ################
> > >>> Marcus Pedersén
> > >>> Systemadministrator
> > >>> Interbull Centre
> > >>> ################
> > >>> Sent from my phone
> > >>> ################
> > >>>
> > >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > >>>
> > >>> Hi Marcus,
> > >>>
> > >>> I think the fix [1] is needed in 4.1
> > >>> Could you please this out and let us know if that works for you?
> > >>>
> > >>> [1] https://review.gluster.org/#/c/20207/
> > >>>
> > >>> Thanks,
> > >>> Kotresh HR
> > >>>
> > >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se> wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
> > >>>
> > >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
> > >>>
> > >>> Both clusters works the way they should on their own.
> > >>>
> > >>> After upgrade on master side status for all geo-replication nodes is Stopped.
> > >>>
> > >>> I tried to start the geo-replication from master node and response back was started successfully.
> > >>>
> > >>> Status again .... Stopped
> > >>>
> > >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
> > >>>
> > >>> After a restart of all glusterd the master cluster was up again.
> > >>>
> > >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
> > >>>
> > >>>
> > >>> Please help me get the geo-replication up and running again.
> > >>>
> > >>>
> > >>> Best regards
> > >>>
> > >>> Marcus Pedersén
> > >>>
> > >>>
> > >>> Part of geo-replication log from master node:
> > >>>
> > >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > >>> elete}
> > >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > >>> elete}
> > >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> > >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
> > >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
> > >>> Traceback (most recent call last):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
> > >>> except:
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
> > >>> sys.exit()
> > >>> TypeError: 'int' object is not iterable
> > >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
> > >>>
> > >>> ---
> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Gluster-users mailing list
> > >>> Gluster-***@gluster.org
> > >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks and Regards,
> > >>> Kotresh H R
> > >>>
> > >>>
> > >>> ---
> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>> ---
> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks and Regards,
> > >>> Kotresh H R
> > >>>
> > >>>
> > >>> ---
> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>> ---
> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>> ---
> > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks and Regards,
> > >> Kotresh H R
> > >
> > >
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Kotresh H R
> > >
> > > ---
> > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-***@gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> - Sunny
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-07-23 12:55:32 UTC

Permalink

Hi,
#find /usr/ -name libglusterfs.so
Gives nothing.

#find /usr/ -name libglusterfs.so*
Gives:
/usr/lib64/libglusterfs.so.0 /usr/lib64/libglusterfs.so.0.0.1

Thanks!
Marcus

################
Marcus PedersÃ©n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 23 juli 2018 14:17 skrev Sunny Kumar <***@redhat.com>:
Hi,

Can you confirm the location for libgfchangelog.so
by sharing output of following command -
# find /usr/ -name libglusterfs.so

- Sunny

On Mon, Jul 23, 2018 at 5:12 PM Marcus PedersÃ©n <***@slu.se> wrote:
>
> Hi Sunny,
> Here comes a part of gsyncd.log (The same info is repeated over and over again):
>
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-23 11:33:20.327152] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-23 11:33:20.335777] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-07-23 11:33:22.11188] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6752
> [2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0894
> [2018-07-23 11:33:23.102168] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> return Changes.cl_init()
> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> from libgfchangelog import Changes as LChanges
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> class Changes(object):
> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> use_errno=True)
> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> self._handle = _dlopen(self._name, mode)
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=29589:140155686246208:1532345603.11 method=init error=OSError
> [2018-07-23 11:33:23.119708] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> func(args)
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> local.service_loop(remote)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> changelog_agent.init()
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> [2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
>
> Thanks, Sunny!!
>
> Regards
> Marcus PedersÃ©n
>
> ________________________________________
> FrÃ¥n: Sunny Kumar <***@redhat.com>
> Skickat: den 23 juli 2018 12:53
> Till: Marcus PedersÃ©n
> Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
> Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
>
> Hi Marcus,
>
> On Mon, Jul 23, 2018 at 4:04 PM Marcus PedersÃ©n <***@slu.se> wrote:
> >
> > Hi Sunny,
> > ldconfig -p /usr/local/lib | grep libgf
> > Output:
> > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> >
> > So that seems to be alright, right?
> >
> Yes, this seems wright can you share the gsyncd.log again
> > Best regards
> > Marcus
> >
> > ################
> > Marcus PedersÃ©n
> > Systemadministrator
> > Interbull Centre
> > ################
> > Sent from my phone
> > ################
> >
> > Den 23 juli 2018 11:17 skrev Sunny Kumar <***@redhat.com>:
> >
> > Hi Marcus,
> >
> > On Wed, Jul 18, 2018 at 4:08 PM Marcus PedersÃ©n <***@slu.se> wrote:
> > >
> > > Hi Kotresh,
> > >
> > > I ran:
> > >
> > > #ldconfig /usr/lib
> > can you do -
> > ldconfig /usr/local/lib
> >
> >
> > Output:
> >
> > >
> > > on all nodes in both clusters but I still get the same error.
> > >
> > > What to do?
> > >
> > >
> > > Output for:
> > >
> > > # ldconfig -p /usr/lib | grep libgf
> > >
> > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> > > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> > > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> > > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > >
> > >
> > > I read somewere that you could change some settings for geo-replication to speed up sync.
> > >
> > > I can not remember where I saw that and what config parameters.
> > >
> > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
> > >
> > > and that will take a while before the slave nodes have catched up.
> > >
> > >
> > > Thanks and regards
> > >
> > > Marcus PedersÃ©n
> > >
> > >
> > > Part of gsyncd.log:
> > >
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > raise res
> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> > > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> > > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > Traceback (most recent call last):
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > res = getattr(self.obj, rmeth)(*in_data[2:])
> > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > return Changes.cl_init()
> > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > from libgfchangelog import Changes as LChanges
> > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > class Changes(object):
> > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > use_errno=True)
> > > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > self._handle = _dlopen(self._name, mode)
> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> > > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > Traceback (most recent call last):
> > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > func(args)
> > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > local.service_loop(remote)
> > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > changelog_agent.init()
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > return self.ins(self.meth, *a)
> > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > raise res
> > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > I think then you will not see this.
> > > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > >
> > >
> > > ________________________________
> > > FrÃ¥n: Kotresh Hiremath Ravishankar <***@redhat.com>
> > > Skickat: den 18 juli 2018 06:05
> > > Till: Marcus PedersÃ©n
> > > Kopia: gluster-***@gluster.org
> > > Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >
> > > Hi Marcus,
> > >
> > > Well there is nothing wrong in setting up a symlink for gluster binary location, but
> > > there is a geo-rep command to set it so that gsyncd will search there.
> > >
> > > To set on master
> > > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
> > >
> > > To set on slave
> > > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
> > >
> > > Thanks,
> > > Kotresh HR
> > >
> > >
> > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
> > >>
> > >> Hi Marcus,
> > >>
> > >> I am testing out 4.1 myself and I will have some update today.
> > >> For this particular traceback, gsyncd is not able to find the library.
> > >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> > >> Please run the cmd below.
> > >>
> > >> #ldconfig /usr/lib
> > >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
> > >>
> > >> Geo-rep should be fixed automatically.
> > >>
> > >> Thanks,
> > >> Kotresh HR
> > >>
> > >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus PedersÃ©n <***@slu.se> wrote:
> > >>>
> > >>> Hi again,
> > >>>
> > >>> I continue to do some testing, but now I have come to a stage where I need help.
> > >>>
> > >>>
> > >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
> > >>>
> > >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
> > >>>
> > >>> Both links were done on all slave nodes.
> > >>>
> > >>>
> > >>> Now I have a new error that I can not resolve myself.
> > >>>
> > >>> It can not open libgfchangelog.so
> > >>>
> > >>>
> > >>> Many thanks!
> > >>>
> > >>> Regards
> > >>>
> > >>> Marcus PedersÃ©n
> > >>>
> > >>>
> > >>> Part of gsyncd.log:
> > >>>
> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
> > >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
> > >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > >>> Traceback (most recent call last):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > >>> res = getattr(self.obj, rmeth)(*in_data[2:])
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > >>> return Changes.cl_init()
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > >>> from libgfchangelog import Changes as LChanges
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > >>> class Changes(object):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > >>> use_errno=True)
> > >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > >>> self._handle = _dlopen(self._name, mode)
> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
> > >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > >>> Traceback (most recent call last):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > >>> func(args)
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > >>> local.service_loop(remote)
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > >>> changelog_agent.init()
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > >>> return self.ins(self.meth, *a)
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > >>> raise res
> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > >>>
> > >>>
> > >>>
> > >>> ________________________________
> > >>> FrÃ¥n: gluster-users-***@gluster.org <gluster-users-***@gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> > >>> Skickat: den 16 juli 2018 21:59
> > >>> Till: ***@redhat.com
> > >>>
> > >>> Kopia: gluster-***@gluster.org
> > >>> Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >>>
> > >>>
> > >>> Hi Kotresh,
> > >>>
> > >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
> > >>>
> > >>> /var/log/glusterfs/cli.log
> > >>>
> > >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
> > >>>
> > >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
> > >>>
> > >>>
> > >>> If I run: gluster-mountbroker status
> > >>>
> > >>> I get:
> > >>>
> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > >>>
> > >>>
> > >>> and that is all nodes on slave cluster, so mountbroker seems ok.
> > >>>
> > >>>
> > >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
> > >>>
> > >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
> > >>>
> > >>> Another error is that SSH between master and slave is broken,
> > >>>
> > >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
> > >>>
> > >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
> > >>>
> > >>> as geouser and that works, which means that the ssh connection works.
> > >>>
> > >>>
> > >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
> > >>>
> > >>> Is gluster supposed to be in /usr/local/sbin/gluster?
> > >>>
> > >>>
> > >>> Do I have any options or should I remove current geo-replication and create a new?
> > >>>
> > >>> How much do I need to clean up before creating a new geo-replication?
> > >>>
> > >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
> > >>>
> > >>>
> > >>> Many thanks in advance!
> > >>>
> > >>> Marcus PedersÃ©n
> > >>>
> > >>>
> > >>> Part from the gsyncd.log:
> > >>>
> > >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > >>>
> > >>> ________________________________
> > >>> FrÃ¥n: gluster-users-***@gluster.org <gluster-users-***@gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> > >>> Skickat: den 13 juli 2018 14:50
> > >>> Till: Kotresh Hiremath Ravishankar
> > >>> Kopia: gluster-***@gluster.org
> > >>> Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >>>
> > >>> Hi Kotresh,
> > >>> Yes, all nodes have the same version 4.1.1 both master and slave.
> > >>> All glusterd are crashing on the master side.
> > >>> Will send logs tonight.
> > >>>
> > >>> Thanks,
> > >>> Marcus
> > >>>
> > >>> ################
> > >>> Marcus PedersÃ©n
> > >>> Systemadministrator
> > >>> Interbull Centre
> > >>> ################
> > >>> Sent from my phone
> > >>> ################
> > >>>
> > >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > >>>
> > >>> Hi Marcus,
> > >>>
> > >>> Is the gluster geo-rep version is same on both master and slave?
> > >>>
> > >>> Thanks,
> > >>> Kotresh HR
> > >>>
> > >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus PedersÃ©n <***@slu.se> wrote:
> > >>>
> > >>> Hi Kotresh,
> > >>>
> > >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
> > >>>
> > >>> I rebooted all servers but geo-replication status is still Stopped.
> > >>>
> > >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
> > >>>
> > >>> Nothing has been written to geo-replication logs since I sent the tail of the log.
> > >>>
> > >>> So I do not know what info to provide?
> > >>>
> > >>>
> > >>> Please, help me to find a way to solve this.
> > >>>
> > >>>
> > >>> Thanks!
> > >>>
> > >>>
> > >>> Regards
> > >>>
> > >>> Marcus
> > >>>
> > >>>
> > >>> ________________________________
> > >>> FrÃ¥n: gluster-users-***@gluster.org <gluster-users-***@gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> > >>> Skickat: den 12 juli 2018 08:51
> > >>> Till: Kotresh Hiremath Ravishankar
> > >>> Kopia: gluster-***@gluster.org
> > >>> Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > >>>
> > >>> Thanks Kotresh,
> > >>> I installed through the official centos channel, centos-release-gluster41.
> > >>> Isn't this fix included in centos install?
> > >>> I will have a look, test it tonight and come back to you!
> > >>>
> > >>> Thanks a lot!
> > >>>
> > >>> Regards
> > >>> Marcus
> > >>>
> > >>> ################
> > >>> Marcus PedersÃ©n
> > >>> Systemadministrator
> > >>> Interbull Centre
> > >>> ################
> > >>> Sent from my phone
> > >>> ################
> > >>>
> > >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > >>>
> > >>> Hi Marcus,
> > >>>
> > >>> I think the fix [1] is needed in 4.1
> > >>> Could you please this out and let us know if that works for you?
> > >>>
> > >>> [1] https://review.gluster.org/#/c/20207/
> > >>>
> > >>> Thanks,
> > >>> Kotresh HR
> > >>>
> > >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus PedersÃ©n <***@slu.se> wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
> > >>>
> > >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
> > >>>
> > >>> Both clusters works the way they should on their own.
> > >>>
> > >>> After upgrade on master side status for all geo-replication nodes is Stopped.
> > >>>
> > >>> I tried to start the geo-replication from master node and response back was started successfully.
> > >>>
> > >>> Status again .... Stopped
> > >>>
> > >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
> > >>>
> > >>> After a restart of all glusterd the master cluster was up again.
> > >>>
> > >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
> > >>>
> > >>>
> > >>> Please help me get the geo-replication up and running again.
> > >>>
> > >>>
> > >>> Best regards
> > >>>
> > >>> Marcus PedersÃ©n
> > >>>
> > >>>
> > >>> Part of geo-replication log from master node:
> > >>>
> > >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > >>> elete}
> > >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > >>> elete}
> > >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> > >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
> > >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
> > >>> Traceback (most recent call last):
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
> > >>> except:
> > >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
> > >>> sys.exit()
> > >>> TypeError: 'int' object is not iterable
> > >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
> > >>>
> > >>> ---
> > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Gluster-users mailing list
> > >>> Gluster-***@gluster.org
> > >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks and Regards,
> > >>> Kotresh H R
> > >>>
> > >>>
> > >>> ---
> > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>> ---
> > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks and Regards,
> > >>> Kotresh H R
> > >>>
> > >>>
> > >>> ---
> > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>> ---
> > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>>
> > >>> ---
> > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks and Regards,
> > >> Kotresh H R
> > >
> > >
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Kotresh H R
> > >
> > > ---
> > > NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-***@gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> > ---
> > NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> - Sunny
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Sunny Kumar

2018-07-23 13:14:26 UTC

Permalink

Hi Marcus,

Okay first apologies for wrong pattern here, please run

# find /usr/ -name libgfchangelog.so

- Sunny
On Mon, Jul 23, 2018 at 6:25 PM Marcus Pedersén <***@slu.se> wrote:
>
> Hi,
> #find /usr/ -name libglusterfs.so
> Gives nothing.
>
> #find /usr/ -name libglusterfs.so*
> Gives:
> /usr/lib64/libglusterfs.so.0 /usr/lib64/libglusterfs.so.0.0.1
>
> Thanks!
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 23 juli 2018 14:17 skrev Sunny Kumar <***@redhat.com>:
>
> Hi,
>
> Can you confirm the location for libgfchangelog.so
> by sharing output of following command -
> # find /usr/ -name libglusterfs.so
>
> - Sunny
>
> On Mon, Jul 23, 2018 at 5:12 PM Marcus Pedersén <***@slu.se> wrote:
> >
> > Hi Sunny,
> > Here comes a part of gsyncd.log (The same info is repeated over and over again):
> >
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.327152] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > [2018-07-23 11:33:20.335777] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > [2018-07-23 11:33:22.11188] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6752
> > [2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > [2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0894
> > [2018-07-23 11:33:23.102168] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > res = getattr(self.obj, rmeth)(*in_data[2:])
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > return Changes.cl_init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > from libgfchangelog import Changes as LChanges
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > class Changes(object):
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > use_errno=True)
> > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > self._handle = _dlopen(self._name, mode)
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=29589:140155686246208:1532345603.11 method=init error=OSError
> > [2018-07-23 11:33:23.119708] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > func(args)
> > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > local.service_loop(remote)
> > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > changelog_agent.init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > return self.ins(self.meth, *a)
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >
> > Thanks, Sunny!!
> >
> > Regards
> > Marcus Pedersén
> >
> > ________________________________________
> > Från: Sunny Kumar <***@redhat.com>
> > Skickat: den 23 juli 2018 12:53
> > Till: Marcus Pedersén
> > Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
> > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >
> > Hi Marcus,
> >
> > On Mon, Jul 23, 2018 at 4:04 PM Marcus Pedersén <***@slu.se> wrote:
> > >
> > > Hi Sunny,
> > > ldconfig -p /usr/local/lib | grep libgf
> > > Output:
> > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > >
> > > So that seems to be alright, right?
> > >
> > Yes, this seems wright can you share the gsyncd.log again
> > > Best regards
> > > Marcus
> > >
> > > ################
> > > Marcus Pedersén
> > > Systemadministrator
> > > Interbull Centre
> > > ################
> > > Sent from my phone
> > > ################
> > >
> > > Den 23 juli 2018 11:17 skrev Sunny Kumar <***@redhat.com>:
> > >
> > > Hi Marcus,
> > >
> > > On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén <***@slu.se> wrote:
> > > >
> > > > Hi Kotresh,
> > > >
> > > > I ran:
> > > >
> > > > #ldconfig /usr/lib
> > > can you do -
> > > ldconfig /usr/local/lib
> > >
> > >
> > > Output:
> > >
> > > >
> > > > on all nodes in both clusters but I still get the same error.
> > > >
> > > > What to do?
> > > >
> > > >
> > > > Output for:
> > > >
> > > > # ldconfig -p /usr/lib | grep libgf
> > > >
> > > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> > > > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> > > > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> > > > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > > >
> > > >
> > > > I read somewere that you could change some settings for geo-replication to speed up sync.
> > > >
> > > > I can not remember where I saw that and what config parameters.
> > > >
> > > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
> > > >
> > > > and that will take a while before the slave nodes have catched up.
> > > >
> > > >
> > > > Thanks and regards
> > > >
> > > > Marcus Pedersén
> > > >
> > > >
> > > > Part of gsyncd.log:
> > > >
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > > raise res
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> > > > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> > > > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > > Traceback (most recent call last):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > > res = getattr(self.obj, rmeth)(*in_data[2:])
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > > return Changes.cl_init()
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > > from libgfchangelog import Changes as LChanges
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > > class Changes(object):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > > use_errno=True)
> > > > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > > self._handle = _dlopen(self._name, mode)
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> > > > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > > Traceback (most recent call last):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > > func(args)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > > local.service_loop(remote)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > > changelog_agent.init()
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > > return self.ins(self.meth, *a)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > > raise res
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > I think then you will not see this.
> > > > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >
> > > >
> > > > ________________________________
> > > > Från: Kotresh Hiremath Ravishankar <***@redhat.com>
> > > > Skickat: den 18 juli 2018 06:05
> > > > Till: Marcus Pedersén
> > > > Kopia: gluster-***@gluster.org
> > > > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >
> > > > Hi Marcus,
> > > >
> > > > Well there is nothing wrong in setting up a symlink for gluster binary location, but
> > > > there is a geo-rep command to set it so that gsyncd will search there.
> > > >
> > > > To set on master
> > > > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
> > > >
> > > > To set on slave
> > > > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
> > > >
> > > > Thanks,
> > > > Kotresh HR
> > > >
> > > >
> > > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
> > > >>
> > > >> Hi Marcus,
> > > >>
> > > >> I am testing out 4.1 myself and I will have some update today.
> > > >> For this particular traceback, gsyncd is not able to find the library.
> > > >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> > > >> Please run the cmd below.
> > > >>
> > > >> #ldconfig /usr/lib
> > > >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
> > > >>
> > > >> Geo-rep should be fixed automatically.
> > > >>
> > > >> Thanks,
> > > >> Kotresh HR
> > > >>
> > > >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <***@slu.se> wrote:
> > > >>>
> > > >>> Hi again,
> > > >>>
> > > >>> I continue to do some testing, but now I have come to a stage where I need help.
> > > >>>
> > > >>>
> > > >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
> > > >>>
> > > >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
> > > >>>
> > > >>> Both links were done on all slave nodes.
> > > >>>
> > > >>>
> > > >>> Now I have a new error that I can not resolve myself.
> > > >>>
> > > >>> It can not open libgfchangelog.so
> > > >>>
> > > >>>
> > > >>> Many thanks!
> > > >>>
> > > >>> Regards
> > > >>>
> > > >>> Marcus Pedersén
> > > >>>
> > > >>>
> > > >>> Part of gsyncd.log:
> > > >>>
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
> > > >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
> > > >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > >>> res = getattr(self.obj, rmeth)(*in_data[2:])
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > >>> return Changes.cl_init()
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > >>> from libgfchangelog import Changes as LChanges
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > >>> class Changes(object):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > >>> use_errno=True)
> > > >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > >>> self._handle = _dlopen(self._name, mode)
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
> > > >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > >>> func(args)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > >>> local.service_loop(remote)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > >>> changelog_agent.init()
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > >>> return self.ins(self.meth, *a)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > >>> raise res
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >>>
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > > >>> Skickat: den 16 juli 2018 21:59
> > > >>> Till: ***@redhat.com
> > > >>>
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>>
> > > >>> Hi Kotresh,
> > > >>>
> > > >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
> > > >>>
> > > >>> /var/log/glusterfs/cli.log
> > > >>>
> > > >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
> > > >>>
> > > >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
> > > >>>
> > > >>>
> > > >>> If I run: gluster-mountbroker status
> > > >>>
> > > >>> I get:
> > > >>>
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>>
> > > >>>
> > > >>> and that is all nodes on slave cluster, so mountbroker seems ok.
> > > >>>
> > > >>>
> > > >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
> > > >>>
> > > >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
> > > >>>
> > > >>> Another error is that SSH between master and slave is broken,
> > > >>>
> > > >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
> > > >>>
> > > >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
> > > >>>
> > > >>> as geouser and that works, which means that the ssh connection works.
> > > >>>
> > > >>>
> > > >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
> > > >>>
> > > >>> Is gluster supposed to be in /usr/local/sbin/gluster?
> > > >>>
> > > >>>
> > > >>> Do I have any options or should I remove current geo-replication and create a new?
> > > >>>
> > > >>> How much do I need to clean up before creating a new geo-replication?
> > > >>>
> > > >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
> > > >>>
> > > >>>
> > > >>> Many thanks in advance!
> > > >>>
> > > >>> Marcus Pedersén
> > > >>>
> > > >>>
> > > >>> Part from the gsyncd.log:
> > > >>>
> > > >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > > >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > > >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > > >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > > >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > > >>>
> > > >>> ________________________________
> > > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > > >>> Skickat: den 13 juli 2018 14:50
> > > >>> Till: Kotresh Hiremath Ravishankar
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>> Hi Kotresh,
> > > >>> Yes, all nodes have the same version 4.1.1 both master and slave.
> > > >>> All glusterd are crashing on the master side.
> > > >>> Will send logs tonight.
> > > >>>
> > > >>> Thanks,
> > > >>> Marcus
> > > >>>
> > > >>> ################
> > > >>> Marcus Pedersén
> > > >>> Systemadministrator
> > > >>> Interbull Centre
> > > >>> ################
> > > >>> Sent from my phone
> > > >>> ################
> > > >>>
> > > >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > > >>>
> > > >>> Hi Marcus,
> > > >>>
> > > >>> Is the gluster geo-rep version is same on both master and slave?
> > > >>>
> > > >>> Thanks,
> > > >>> Kotresh HR
> > > >>>
> > > >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se> wrote:
> > > >>>
> > > >>> Hi Kotresh,
> > > >>>
> > > >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
> > > >>>
> > > >>> I rebooted all servers but geo-replication status is still Stopped.
> > > >>>
> > > >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
> > > >>>
> > > >>> Nothing has been written to geo-replication logs since I sent the tail of the log.
> > > >>>
> > > >>> So I do not know what info to provide?
> > > >>>
> > > >>>
> > > >>> Please, help me to find a way to solve this.
> > > >>>
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>>
> > > >>> Regards
> > > >>>
> > > >>> Marcus
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > > >>> Skickat: den 12 juli 2018 08:51
> > > >>> Till: Kotresh Hiremath Ravishankar
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>> Thanks Kotresh,
> > > >>> I installed through the official centos channel, centos-release-gluster41.
> > > >>> Isn't this fix included in centos install?
> > > >>> I will have a look, test it tonight and come back to you!
> > > >>>
> > > >>> Thanks a lot!
> > > >>>
> > > >>> Regards
> > > >>> Marcus
> > > >>>
> > > >>> ################
> > > >>> Marcus Pedersén
> > > >>> Systemadministrator
> > > >>> Interbull Centre
> > > >>> ################
> > > >>> Sent from my phone
> > > >>> ################
> > > >>>
> > > >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > > >>>
> > > >>> Hi Marcus,
> > > >>>
> > > >>> I think the fix [1] is needed in 4.1
> > > >>> Could you please this out and let us know if that works for you?
> > > >>>
> > > >>> [1] https://review.gluster.org/#/c/20207/
> > > >>>
> > > >>> Thanks,
> > > >>> Kotresh HR
> > > >>>
> > > >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se> wrote:
> > > >>>
> > > >>> Hi all,
> > > >>>
> > > >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
> > > >>>
> > > >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
> > > >>>
> > > >>> Both clusters works the way they should on their own.
> > > >>>
> > > >>> After upgrade on master side status for all geo-replication nodes is Stopped.
> > > >>>
> > > >>> I tried to start the geo-replication from master node and response back was started successfully.
> > > >>>
> > > >>> Status again .... Stopped
> > > >>>
> > > >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
> > > >>>
> > > >>> After a restart of all glusterd the master cluster was up again.
> > > >>>
> > > >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
> > > >>>
> > > >>>
> > > >>> Please help me get the geo-replication up and running again.
> > > >>>
> > > >>>
> > > >>> Best regards
> > > >>>
> > > >>> Marcus Pedersén
> > > >>>
> > > >>>
> > > >>> Part of geo-replication log from master node:
> > > >>>
> > > >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > > >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > > >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > > >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > > >>> elete}
> > > >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > > >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > > >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > > >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > > >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > > >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > > >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > > >>> elete}
> > > >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > > >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > > >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > > >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
> > > >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
> > > >>> except:
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
> > > >>> sys.exit()
> > > >>> TypeError: 'int' object is not iterable
> > > >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> Gluster-users mailing list
> > > >>> Gluster-***@gluster.org
> > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks and Regards,
> > > >>> Kotresh H R
> > > >>>
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks and Regards,
> > > >>> Kotresh H R
> > > >>>
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks and Regards,
> > > >> Kotresh H R
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ---
> > > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-***@gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > >
> > >
> > > ---
> > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >
> > - Sunny
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

Marcus Pedersén

2018-07-23 13:38:09 UTC

Permalink

Hi again Sunny,
Sorry, I missed the obvious myself!

#find /usr/ -name libgfchangelog.so
Gives nothing

#find /usr/ -name libgfchangelog.so*
Gives:
/usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so.0.0.1

Regards
Marcus

################
Marcus PedersÃ©n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 23 juli 2018 15:15 skrev Sunny Kumar <***@redhat.com>:
Hi Marcus,

Okay first apologies for wrong pattern here, please run

# find /usr/ -name libgfchangelog.so

- Sunny
On Mon, Jul 23, 2018 at 6:25 PM Marcus PedersÃ©n <***@slu.se> wrote:
>
> Hi,
> #find /usr/ -name libglusterfs.so
> Gives nothing.
>
> #find /usr/ -name libglusterfs.so*
> Gives:
> /usr/lib64/libglusterfs.so.0 /usr/lib64/libglusterfs.so.0.0.1
>
> Thanks!
> Marcus
>
> ################
> Marcus PedersÃ©n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 23 juli 2018 14:17 skrev Sunny Kumar <***@redhat.com>:
>
> Hi,
>
> Can you confirm the location for libgfchangelog.so
> by sharing output of following command -
> # find /usr/ -name libglusterfs.so
>
> - Sunny
>
> On Mon, Jul 23, 2018 at 5:12 PM Marcus PedersÃ©n <***@slu.se> wrote:
> >
> > Hi Sunny,
> > Here comes a part of gsyncd.log (The same info is repeated over and over again):
> >
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.327152] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > [2018-07-23 11:33:20.335777] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > [2018-07-23 11:33:22.11188] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6752
> > [2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > [2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0894
> > [2018-07-23 11:33:23.102168] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > res = getattr(self.obj, rmeth)(*in_data[2:])
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > return Changes.cl_init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > from libgfchangelog import Changes as LChanges
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > class Changes(object):
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > use_errno=True)
> > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > self._handle = _dlopen(self._name, mode)
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=29589:140155686246208:1532345603.11 method=init error=OSError
> > [2018-07-23 11:33:23.119708] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > func(args)
> > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > local.service_loop(remote)
> > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > changelog_agent.init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > return self.ins(self.meth, *a)
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >
> > Thanks, Sunny!!
> >
> > Regards
> > Marcus PedersÃ©n
> >
> > ________________________________________
> > FrÃ¥n: Sunny Kumar <***@redhat.com>
> > Skickat: den 23 juli 2018 12:53
> > Till: Marcus PedersÃ©n
> > Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
> > Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >
> > Hi Marcus,
> >
> > On Mon, Jul 23, 2018 at 4:04 PM Marcus PedersÃ©n <***@slu.se> wrote:
> > >
> > > Hi Sunny,
> > > ldconfig -p /usr/local/lib | grep libgf
> > > Output:
> > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > >
> > > So that seems to be alright, right?
> > >
> > Yes, this seems wright can you share the gsyncd.log again
> > > Best regards
> > > Marcus
> > >
> > > ################
> > > Marcus PedersÃ©n
> > > Systemadministrator
> > > Interbull Centre
> > > ################
> > > Sent from my phone
> > > ################
> > >
> > > Den 23 juli 2018 11:17 skrev Sunny Kumar <***@redhat.com>:
> > >
> > > Hi Marcus,
> > >
> > > On Wed, Jul 18, 2018 at 4:08 PM Marcus PedersÃ©n <***@slu.se> wrote:
> > > >
> > > > Hi Kotresh,
> > > >
> > > > I ran:
> > > >
> > > > #ldconfig /usr/lib
> > > can you do -
> > > ldconfig /usr/local/lib
> > >
> > >
> > > Output:
> > >
> > > >
> > > > on all nodes in both clusters but I still get the same error.
> > > >
> > > > What to do?
> > > >
> > > >
> > > > Output for:
> > > >
> > > > # ldconfig -p /usr/lib | grep libgf
> > > >
> > > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> > > > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> > > > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> > > > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > > >
> > > >
> > > > I read somewere that you could change some settings for geo-replication to speed up sync.
> > > >
> > > > I can not remember where I saw that and what config parameters.
> > > >
> > > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
> > > >
> > > > and that will take a while before the slave nodes have catched up.
> > > >
> > > >
> > > > Thanks and regards
> > > >
> > > > Marcus PedersÃ©n
> > > >
> > > >
> > > > Part of gsyncd.log:
> > > >
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > > raise res
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> > > > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> > > > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > > Traceback (most recent call last):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > > res = getattr(self.obj, rmeth)(*in_data[2:])
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > > return Changes.cl_init()
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > > from libgfchangelog import Changes as LChanges
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > > class Changes(object):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > > use_errno=True)
> > > > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > > self._handle = _dlopen(self._name, mode)
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> > > > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > > Traceback (most recent call last):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > > func(args)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > > local.service_loop(remote)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > > changelog_agent.init()
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > > return self.ins(self.meth, *a)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > > raise res
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > I think then you will not see this.
> > > > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >
> > > >
> > > > ________________________________
> > > > FrÃ¥n: Kotresh Hiremath Ravishankar <***@redhat.com>
> > > > Skickat: den 18 juli 2018 06:05
> > > > Till: Marcus PedersÃ©n
> > > > Kopia: gluster-***@gluster.org
> > > > Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >
> > > > Hi Marcus,
> > > >
> > > > Well there is nothing wrong in setting up a symlink for gluster binary location, but
> > > > there is a geo-rep command to set it so that gsyncd will search there.
> > > >
> > > > To set on master
> > > > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
> > > >
> > > > To set on slave
> > > > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
> > > >
> > > > Thanks,
> > > > Kotresh HR
> > > >
> > > >
> > > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
> > > >>
> > > >> Hi Marcus,
> > > >>
> > > >> I am testing out 4.1 myself and I will have some update today.
> > > >> For this particular traceback, gsyncd is not able to find the library.
> > > >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> > > >> Please run the cmd below.
> > > >>
> > > >> #ldconfig /usr/lib
> > > >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
> > > >>
> > > >> Geo-rep should be fixed automatically.
> > > >>
> > > >> Thanks,
> > > >> Kotresh HR
> > > >>
> > > >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus PedersÃ©n <***@slu.se> wrote:
> > > >>>
> > > >>> Hi again,
> > > >>>
> > > >>> I continue to do some testing, but now I have come to a stage where I need help.
> > > >>>
> > > >>>
> > > >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
> > > >>>
> > > >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
> > > >>>
> > > >>> Both links were done on all slave nodes.
> > > >>>
> > > >>>
> > > >>> Now I have a new error that I can not resolve myself.
> > > >>>
> > > >>> It can not open libgfchangelog.so
> > > >>>
> > > >>>
> > > >>> Many thanks!
> > > >>>
> > > >>> Regards
> > > >>>
> > > >>> Marcus PedersÃ©n
> > > >>>
> > > >>>
> > > >>> Part of gsyncd.log:
> > > >>>
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
> > > >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
> > > >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > >>> res = getattr(self.obj, rmeth)(*in_data[2:])
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > >>> return Changes.cl_init()
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > >>> from libgfchangelog import Changes as LChanges
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > >>> class Changes(object):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > >>> use_errno=True)
> > > >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > >>> self._handle = _dlopen(self._name, mode)
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
> > > >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > >>> func(args)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > >>> local.service_loop(remote)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > >>> changelog_agent.init()
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > >>> return self.ins(self.meth, *a)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > >>> raise res
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >>>
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> FrÃ¥n: gluster-users-***@gluster.org <gluster-users-***@gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> > > >>> Skickat: den 16 juli 2018 21:59
> > > >>> Till: ***@redhat.com
> > > >>>
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>>
> > > >>> Hi Kotresh,
> > > >>>
> > > >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
> > > >>>
> > > >>> /var/log/glusterfs/cli.log
> > > >>>
> > > >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
> > > >>>
> > > >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
> > > >>>
> > > >>>
> > > >>> If I run: gluster-mountbroker status
> > > >>>
> > > >>> I get:
> > > >>>
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>>
> > > >>>
> > > >>> and that is all nodes on slave cluster, so mountbroker seems ok.
> > > >>>
> > > >>>
> > > >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
> > > >>>
> > > >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
> > > >>>
> > > >>> Another error is that SSH between master and slave is broken,
> > > >>>
> > > >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
> > > >>>
> > > >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
> > > >>>
> > > >>> as geouser and that works, which means that the ssh connection works.
> > > >>>
> > > >>>
> > > >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
> > > >>>
> > > >>> Is gluster supposed to be in /usr/local/sbin/gluster?
> > > >>>
> > > >>>
> > > >>> Do I have any options or should I remove current geo-replication and create a new?
> > > >>>
> > > >>> How much do I need to clean up before creating a new geo-replication?
> > > >>>
> > > >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
> > > >>>
> > > >>>
> > > >>> Many thanks in advance!
> > > >>>
> > > >>> Marcus PedersÃ©n
> > > >>>
> > > >>>
> > > >>> Part from the gsyncd.log:
> > > >>>
> > > >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > > >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > > >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > > >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > > >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > > >>>
> > > >>> ________________________________
> > > >>> FrÃ¥n: gluster-users-***@gluster.org <gluster-users-***@gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> > > >>> Skickat: den 13 juli 2018 14:50
> > > >>> Till: Kotresh Hiremath Ravishankar
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>> Hi Kotresh,
> > > >>> Yes, all nodes have the same version 4.1.1 both master and slave.
> > > >>> All glusterd are crashing on the master side.
> > > >>> Will send logs tonight.
> > > >>>
> > > >>> Thanks,
> > > >>> Marcus
> > > >>>
> > > >>> ################
> > > >>> Marcus PedersÃ©n
> > > >>> Systemadministrator
> > > >>> Interbull Centre
> > > >>> ################
> > > >>> Sent from my phone
> > > >>> ################
> > > >>>
> > > >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > > >>>
> > > >>> Hi Marcus,
> > > >>>
> > > >>> Is the gluster geo-rep version is same on both master and slave?
> > > >>>
> > > >>> Thanks,
> > > >>> Kotresh HR
> > > >>>
> > > >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus PedersÃ©n <***@slu.se> wrote:
> > > >>>
> > > >>> Hi Kotresh,
> > > >>>
> > > >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
> > > >>>
> > > >>> I rebooted all servers but geo-replication status is still Stopped.
> > > >>>
> > > >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
> > > >>>
> > > >>> Nothing has been written to geo-replication logs since I sent the tail of the log.
> > > >>>
> > > >>> So I do not know what info to provide?
> > > >>>
> > > >>>
> > > >>> Please, help me to find a way to solve this.
> > > >>>
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>>
> > > >>> Regards
> > > >>>
> > > >>> Marcus
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> FrÃ¥n: gluster-users-***@gluster.org <gluster-users-***@gluster.org> fÃ¶r Marcus PedersÃ©n <***@slu.se>
> > > >>> Skickat: den 12 juli 2018 08:51
> > > >>> Till: Kotresh Hiremath Ravishankar
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ãmne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>> Thanks Kotresh,
> > > >>> I installed through the official centos channel, centos-release-gluster41.
> > > >>> Isn't this fix included in centos install?
> > > >>> I will have a look, test it tonight and come back to you!
> > > >>>
> > > >>> Thanks a lot!
> > > >>>
> > > >>> Regards
> > > >>> Marcus
> > > >>>
> > > >>> ################
> > > >>> Marcus PedersÃ©n
> > > >>> Systemadministrator
> > > >>> Interbull Centre
> > > >>> ################
> > > >>> Sent from my phone
> > > >>> ################
> > > >>>
> > > >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > > >>>
> > > >>> Hi Marcus,
> > > >>>
> > > >>> I think the fix [1] is needed in 4.1
> > > >>> Could you please this out and let us know if that works for you?
> > > >>>
> > > >>> [1] https://review.gluster.org/#/c/20207/
> > > >>>
> > > >>> Thanks,
> > > >>> Kotresh HR
> > > >>>
> > > >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus PedersÃ©n <***@slu.se> wrote:
> > > >>>
> > > >>> Hi all,
> > > >>>
> > > >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
> > > >>>
> > > >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
> > > >>>
> > > >>> Both clusters works the way they should on their own.
> > > >>>
> > > >>> After upgrade on master side status for all geo-replication nodes is Stopped.
> > > >>>
> > > >>> I tried to start the geo-replication from master node and response back was started successfully.
> > > >>>
> > > >>> Status again .... Stopped
> > > >>>
> > > >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
> > > >>>
> > > >>> After a restart of all glusterd the master cluster was up again.
> > > >>>
> > > >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
> > > >>>
> > > >>>
> > > >>> Please help me get the geo-replication up and running again.
> > > >>>
> > > >>>
> > > >>> Best regards
> > > >>>
> > > >>> Marcus PedersÃ©n
> > > >>>
> > > >>>
> > > >>> Part of geo-replication log from master node:
> > > >>>
> > > >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > > >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > > >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > > >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > > >>> elete}
> > > >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > > >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > > >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > > >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > > >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > > >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > > >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > > >>> elete}
> > > >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > > >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > > >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > > >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
> > > >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
> > > >>> except:
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
> > > >>> sys.exit()
> > > >>> TypeError: 'int' object is not iterable
> > > >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
> > > >>>
> > > >>> ---
> > > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> Gluster-users mailing list
> > > >>> Gluster-***@gluster.org
> > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks and Regards,
> > > >>> Kotresh H R
> > > >>>
> > > >>>
> > > >>> ---
> > > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks and Regards,
> > > >>> Kotresh H R
> > > >>>
> > > >>>
> > > >>> ---
> > > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks and Regards,
> > > >> Kotresh H R
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ---
> > > > NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-***@gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > >
> > >
> > > ---
> > > NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >
> > - Sunny
> > ---
> > NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-07-26 18:38:27 UTC

Permalink

Thanks for your help, Sunny and Kotresh!

The geo-replication is working now!
The final step I tried was to make a symlink
ln -s /usr/lib64/libgfchangelog.so.1 /usr/lib64/libgfchangelog.so

After that everything started working!

Do I need to report the steps I made somewhere? I don't know if rpm is made by gluster or Cent os?

I started with Cent os 7 and gluster 3.12.9, installed from Cent os SIG gluster
I did the following steps:
- Installed and upgraded to 4.1.1 from Cent os SIG gluster
- Installed fix https://review.gluster.org/#/c/20207/
- Changed permissions on file /var/log/glusterfs/cli.log so geo user could access it
- Made symlinks to /usr/local/sbin/gluster and /usr/local/sbin/gluster
Better way should have been to change config:
#gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
#gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
- Made symlink:
ln -s /usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so

Thanks for all help!

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 23 juli 2018 20:11 skrev Marcus Pedersén <***@slu.se>:
Hi again Sunny,
Sorry, I missed the obvious myself!

#find /usr/ -name libgfchangelog.so
Gives nothing

#find /usr/ -name libgfchangelog.so*
Gives:
/usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so.0.0.1

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 23 juli 2018 15:15 skrev Sunny Kumar <***@redhat.com>:
Hi Marcus,

Okay first apologies for wrong pattern here, please run

# find /usr/ -name libgfchangelog.so

- Sunny
On Mon, Jul 23, 2018 at 6:25 PM Marcus Pedersén <***@slu.se> wrote:
>
> Hi,
> #find /usr/ -name libglusterfs.so
> Gives nothing.
>
> #find /usr/ -name libglusterfs.so*
> Gives:
> /usr/lib64/libglusterfs.so.0 /usr/lib64/libglusterfs.so.0.0.1
>
> Thanks!
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 23 juli 2018 14:17 skrev Sunny Kumar <***@redhat.com>:
>
> Hi,
>
> Can you confirm the location for libgfchangelog.so
> by sharing output of following command -
> # find /usr/ -name libglusterfs.so
>
> - Sunny
>
> On Mon, Jul 23, 2018 at 5:12 PM Marcus Pedersén <***@slu.se> wrote:
> >
> > Hi Sunny,
> > Here comes a part of gsyncd.log (The same info is repeated over and over again):
> >
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.327152] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > [2018-07-23 11:33:20.335777] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > [2018-07-23 11:33:22.11188] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6752
> > [2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > [2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0894
> > [2018-07-23 11:33:23.102168] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > res = getattr(self.obj, rmeth)(*in_data[2:])
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > return Changes.cl_init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > from libgfchangelog import Changes as LChanges
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > class Changes(object):
> > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > use_errno=True)
> > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > self._handle = _dlopen(self._name, mode)
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=29589:140155686246208:1532345603.11 method=init error=OSError
> > [2018-07-23 11:33:23.119708] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > Traceback (most recent call last):
> > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > func(args)
> > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > local.service_loop(remote)
> > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > changelog_agent.init()
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > return self.ins(self.meth, *a)
> > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > [2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> >
> > Thanks, Sunny!!
> >
> > Regards
> > Marcus Pedersén
> >
> > ________________________________________
> > Från: Sunny Kumar <***@redhat.com>
> > Skickat: den 23 juli 2018 12:53
> > Till: Marcus Pedersén
> > Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
> > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> >
> > Hi Marcus,
> >
> > On Mon, Jul 23, 2018 at 4:04 PM Marcus Pedersén <***@slu.se> wrote:
> > >
> > > Hi Sunny,
> > > ldconfig -p /usr/local/lib | grep libgf
> > > Output:
> > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > >
> > > So that seems to be alright, right?
> > >
> > Yes, this seems wright can you share the gsyncd.log again
> > > Best regards
> > > Marcus
> > >
> > > ################
> > > Marcus Pedersén
> > > Systemadministrator
> > > Interbull Centre
> > > ################
> > > Sent from my phone
> > > ################
> > >
> > > Den 23 juli 2018 11:17 skrev Sunny Kumar <***@redhat.com>:
> > >
> > > Hi Marcus,
> > >
> > > On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén <***@slu.se> wrote:
> > > >
> > > > Hi Kotresh,
> > > >
> > > > I ran:
> > > >
> > > > #ldconfig /usr/lib
> > > can you do -
> > > ldconfig /usr/local/lib
> > >
> > >
> > > Output:
> > >
> > > >
> > > > on all nodes in both clusters but I still get the same error.
> > > >
> > > > What to do?
> > > >
> > > >
> > > > Output for:
> > > >
> > > > # ldconfig -p /usr/lib | grep libgf
> > > >
> > > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> > > > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> > > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> > > > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> > > > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
> > > >
> > > >
> > > > I read somewere that you could change some settings for geo-replication to speed up sync.
> > > >
> > > > I can not remember where I saw that and what config parameters.
> > > >
> > > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes,
> > > >
> > > > and that will take a while before the slave nodes have catched up.
> > > >
> > > >
> > > > Thanks and regards
> > > >
> > > > Marcus Pedersén
> > > >
> > > >
> > > > Part of gsyncd.log:
> > > >
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > > raise res
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373
> > > > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853
> > > > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > > Traceback (most recent call last):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > > res = getattr(self.obj, rmeth)(*in_data[2:])
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > > return Changes.cl_init()
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > > from libgfchangelog import Changes as LChanges
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > > class Changes(object):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > > use_errno=True)
> > > > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > > self._handle = _dlopen(self._name, mode)
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError
> > > > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > > Traceback (most recent call last):
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > > func(args)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > > local.service_loop(remote)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > > changelog_agent.init()
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > > return self.ins(self.meth, *a)
> > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > > raise res
> > > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > I think then you will not see this.
> > > > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >
> > > >
> > > > ________________________________
> > > > Från: Kotresh Hiremath Ravishankar <***@redhat.com>
> > > > Skickat: den 18 juli 2018 06:05
> > > > Till: Marcus Pedersén
> > > > Kopia: gluster-***@gluster.org
> > > > Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >
> > > > Hi Marcus,
> > > >
> > > > Well there is nothing wrong in setting up a symlink for gluster binary location, but
> > > > there is a geo-rep command to set it so that gsyncd will search there.
> > > >
> > > > To set on master
> > > > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location>
> > > >
> > > > To set on slave
> > > > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location>
> > > >
> > > > Thanks,
> > > > Kotresh HR
> > > >
> > > >
> > > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <***@redhat.com> wrote:
> > > >>
> > > >> Hi Marcus,
> > > >>
> > > >> I am testing out 4.1 myself and I will have some update today.
> > > >> For this particular traceback, gsyncd is not able to find the library.
> > > >> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> > > >> Please run the cmd below.
> > > >>
> > > >> #ldconfig /usr/lib
> > > >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so)
> > > >>
> > > >> Geo-rep should be fixed automatically.
> > > >>
> > > >> Thanks,
> > > >> Kotresh HR
> > > >>
> > > >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Pedersén <***@slu.se> wrote:
> > > >>>
> > > >>> Hi again,
> > > >>>
> > > >>> I continue to do some testing, but now I have come to a stage where I need help.
> > > >>>
> > > >>>
> > > >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link.
> > > >>>
> > > >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well.
> > > >>>
> > > >>> Both links were done on all slave nodes.
> > > >>>
> > > >>>
> > > >>> Now I have a new error that I can not resolve myself.
> > > >>>
> > > >>> It can not open libgfchangelog.so
> > > >>>
> > > >>>
> > > >>> Many thanks!
> > > >>>
> > > >>> Regards
> > > >>>
> > > >>> Marcus Pedersén
> > > >>>
> > > >>>
> > > >>> Part of gsyncd.log:
> > > >>>
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151
> > > >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > > >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901
> > > >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> > > >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
> > > >>> res = getattr(self.obj, rmeth)(*in_data[2:])
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init
> > > >>> return Changes.cl_init()
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__
> > > >>> from libgfchangelog import Changes as LChanges
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module>
> > > >>> class Changes(object):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes
> > > >>> use_errno=True)
> > > >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> > > >>> self._handle = _dlopen(self._name, mode)
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError
> > > >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> > > >>> func(args)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker
> > > >>> local.service_loop(remote)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop
> > > >>> changelog_agent.init()
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__
> > > >>> return self.ins(self.meth, *a)
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__
> > > >>> raise res
> > > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory
> > > >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> > > >>>
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > > >>> Skickat: den 16 juli 2018 21:59
> > > >>> Till: ***@redhat.com
> > > >>>
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>>
> > > >>> Hi Kotresh,
> > > >>>
> > > >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file:
> > > >>>
> > > >>> /var/log/glusterfs/cli.log
> > > >>>
> > > >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it.
> > > >>>
> > > >>> Starting geo-replication after that gives response successful but all nodes get status Faulty.
> > > >>>
> > > >>>
> > > >>> If I run: gluster-mountbroker status
> > > >>>
> > > >>> I get:
> > > >>>
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS |
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) |
> > > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+
> > > >>>
> > > >>>
> > > >>> and that is all nodes on slave cluster, so mountbroker seems ok.
> > > >>>
> > > >>>
> > > >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
> > > >>>
> > > >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
> > > >>>
> > > >>> Another error is that SSH between master and slave is broken,
> > > >>>
> > > >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run:
> > > >>>
> > > >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 ***@urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume
> > > >>>
> > > >>> as geouser and that works, which means that the ssh connection works.
> > > >>>
> > > >>>
> > > >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup?
> > > >>>
> > > >>> Is gluster supposed to be in /usr/local/sbin/gluster?
> > > >>>
> > > >>>
> > > >>> Do I have any options or should I remove current geo-replication and create a new?
> > > >>>
> > > >>> How much do I need to clean up before creating a new geo-replication?
> > > >>>
> > > >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files?
> > > >>>
> > > >>>
> > > >>> Many thanks in advance!
> > > >>>
> > > >>> Marcus Pedersén
> > > >>>
> > > >>>
> > > >>> Part from the gsyncd.log:
> > > >>>
> > > >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > > >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > > >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
> > > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume ***@urd-gds-geo-001::urd-gds-volu\
> > > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
> > > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1
> > > >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory)
> > > >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> > > >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > > >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken
> > > >>>
> > > >>> ________________________________
> > > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > > >>> Skickat: den 13 juli 2018 14:50
> > > >>> Till: Kotresh Hiremath Ravishankar
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>> Hi Kotresh,
> > > >>> Yes, all nodes have the same version 4.1.1 both master and slave.
> > > >>> All glusterd are crashing on the master side.
> > > >>> Will send logs tonight.
> > > >>>
> > > >>> Thanks,
> > > >>> Marcus
> > > >>>
> > > >>> ################
> > > >>> Marcus Pedersén
> > > >>> Systemadministrator
> > > >>> Interbull Centre
> > > >>> ################
> > > >>> Sent from my phone
> > > >>> ################
> > > >>>
> > > >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > > >>>
> > > >>> Hi Marcus,
> > > >>>
> > > >>> Is the gluster geo-rep version is same on both master and slave?
> > > >>>
> > > >>> Thanks,
> > > >>> Kotresh HR
> > > >>>
> > > >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén <***@slu.se> wrote:
> > > >>>
> > > >>> Hi Kotresh,
> > > >>>
> > > >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave.
> > > >>>
> > > >>> I rebooted all servers but geo-replication status is still Stopped.
> > > >>>
> > > >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes.
> > > >>>
> > > >>> Nothing has been written to geo-replication logs since I sent the tail of the log.
> > > >>>
> > > >>> So I do not know what info to provide?
> > > >>>
> > > >>>
> > > >>> Please, help me to find a way to solve this.
> > > >>>
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>>
> > > >>> Regards
> > > >>>
> > > >>> Marcus
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> > > >>> Skickat: den 12 juli 2018 08:51
> > > >>> Till: Kotresh Hiremath Ravishankar
> > > >>> Kopia: gluster-***@gluster.org
> > > >>> Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work
> > > >>>
> > > >>> Thanks Kotresh,
> > > >>> I installed through the official centos channel, centos-release-gluster41.
> > > >>> Isn't this fix included in centos install?
> > > >>> I will have a look, test it tonight and come back to you!
> > > >>>
> > > >>> Thanks a lot!
> > > >>>
> > > >>> Regards
> > > >>> Marcus
> > > >>>
> > > >>> ################
> > > >>> Marcus Pedersén
> > > >>> Systemadministrator
> > > >>> Interbull Centre
> > > >>> ################
> > > >>> Sent from my phone
> > > >>> ################
> > > >>>
> > > >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> > > >>>
> > > >>> Hi Marcus,
> > > >>>
> > > >>> I think the fix [1] is needed in 4.1
> > > >>> Could you please this out and let us know if that works for you?
> > > >>>
> > > >>> [1] https://review.gluster.org/#/c/20207/
> > > >>>
> > > >>> Thanks,
> > > >>> Kotresh HR
> > > >>>
> > > >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén <***@slu.se> wrote:
> > > >>>
> > > >>> Hi all,
> > > >>>
> > > >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade.
> > > >>>
> > > >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1).
> > > >>>
> > > >>> Both clusters works the way they should on their own.
> > > >>>
> > > >>> After upgrade on master side status for all geo-replication nodes is Stopped.
> > > >>>
> > > >>> I tried to start the geo-replication from master node and response back was started successfully.
> > > >>>
> > > >>> Status again .... Stopped
> > > >>>
> > > >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes.
> > > >>>
> > > >>> After a restart of all glusterd the master cluster was up again.
> > > >>>
> > > >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped.
> > > >>>
> > > >>>
> > > >>> Please help me get the geo-replication up and running again.
> > > >>>
> > > >>>
> > > >>> Best regards
> > > >>>
> > > >>> Marcus Pedersén
> > > >>>
> > > >>>
> > > >>> Part of geo-replication log from master node:
> > > >>>
> > > >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > > >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > > >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > > >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > > >>> elete}
> > > >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > > >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > > >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > > >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave...
> > > >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining...
> > > >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken
> > > >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
> > > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock ***@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
> > > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2
> > > >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h]
> > > >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh>
> > > >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
> > > >>> elete}
> > > >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ...
> > > >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
> > > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete')
> > > >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF.
> > > >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting.
> > > >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://***@urd-gds-geo-000:gluster://localhost:urd-gds-volume
> > > >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster
> > > >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent
> > > >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL:
> > > >>> Traceback (most recent call last):
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap
> > > >>> except:
> > > >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon
> > > >>> sys.exit()
> > > >>> TypeError: 'int' object is not iterable
> > > >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting.
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> Gluster-users mailing list
> > > >>> Gluster-***@gluster.org
> > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks and Regards,
> > > >>> Kotresh H R
> > > >>>
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks and Regards,
> > > >>> Kotresh H R
> > > >>>
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>>
> > > >>> ---
> > > >>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks and Regards,
> > > >> Kotresh H R
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ---
> > > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> > > >
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-***@gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > >
> > >
> > > ---
> > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
> >
> > - Sunny
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>