[Gluster-users] Geo-replication stops after 4-5 hours

Discussion:

Marcus Pedersén

2018-08-01 20:18:30 UTC

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.

I have looked through logs but can not find anything obvious.

Status for geo-replication is active for the two same nodes all the time:

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A

Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB

Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB

Parts from gsyncd.logs are enclosed.

Thanks a lot!

Best regards

Marcus Pedersén

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Kotresh Hiremath Ravishankar

2018-08-02 04:13:33 UTC

Permalink

Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus PedersÃ©n <***@slu.se>
wrote:

> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another
> thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started
> geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it
> stops.
>
> After that nothing happens, I have waited for days but still
> nothing happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
> SLAVE SLAVE NODE STATUS
> CRAWL STATUS LAST_SYNCED ENTRY DATA META
> FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT
> COMPLETION TIME
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------------------
> urd-gds-001 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active
> History Crawl 2018-04-16 20:32:09 0 14205 0
> 0 2018-07-27 21:12:44 No
> N/A
> urd-gds-002 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A
> N/A
> urd-gds-004 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A
> N/A
> urd-gds-003 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active
> History Crawl 2018-05-01 20:58:14 285 4552 0
> 0 2018-07-27 21:12:44 No
> N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A
> N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus PedersÃ©n
>
>
>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

--
Thanks and Regards,
Kotresh H R

Kotresh Hiremath Ravishankar

2018-08-02 06:07:05 UTC

Permalink

Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus PedersÃ©n <***@slu.se>
wrote:

> Hi Kortesh,
> rsync version 3.1.2 protocol version 31
> All nodes run CentOS 7, updated the last couple of days.
>
> Thanks
> Marcus
>
> ################
> Marcus PedersÃ©n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
>
> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <
> ***@redhat.com>:
>
> Hi Marcus,
>
> What's the rsync version being used?
>
> Thanks,
> Kotresh HR
>
> On Thu, Aug 2, 2018 at 1:48 AM, Marcus PedersÃ©n <***@slu.se>
> wrote:
>
> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another
> thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started
> geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it
> stops.
>
> After that nothing happens, I have waited for days but still
> nothing happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
> SLAVE SLAVE NODE STATUS
> CRAWL STATUS LAST_SYNCED ENTRY DATA META
> FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT
> COMPLETION TIME
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------------------
> urd-gds-001 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active
> History Crawl 2018-04-16 20:32:09 0 14205 0
> 0 2018-07-27 21:12:44 No
> N/A
> urd-gds-002 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A
> N/A
> urd-gds-004 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A
> N/A
> urd-gds-003 urd-gds-volume /urd-gds/gluster geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active
> History Crawl 2018-05-01 20:58:14 285 4552 0
> 0 2018-07-27 21:12:44 No
> N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A
> N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser
> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive
> N/A N/A N/A N/A N/A
> N/A N/A N/A N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus PedersÃ©n
>
>
>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>

--
Thanks and Regards,
Kotresh H R

Marcus Pedersén

2018-08-02 06:55:51 UTC

Permalink

On both active master nodes there is an rsync process. As in:

root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and gsyncd.py processes.

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:
Hi Kortesh,
rsync version 3.1.2 protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.

I have looked through logs but can not find anything obvious.

Status for geo-replication is active for the two same nodes all the time:

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A

Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB

Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB

Parts from gsyncd.logs are enclosed.

Thanks a lot!

Best regards

Marcus Pedersén

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Kotresh Hiremath Ravishankar

2018-08-02 07:31:46 UTC

Permalink

Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus PedersÃ©n <***@slu.se>
wrote:

> On both active master nodes there is an rsync process. As in:
>
> root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync
> -aR0 --inplace --files-from=- --super --stats --numeric-ids
> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/
> bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/
> 13077/cwd
>
> There is also ssh tunnels to slave nodes and gsyncd.py processes.
>
> Regards
> Marcus
>
> ################
> Marcus PedersÃ©n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <
> ***@redhat.com>:
> Could you look of any rsync processes hung in master or slave?
>
> On Thu, Aug 2, 2018 at 11:18 AM, Marcus PedersÃ©n <***@slu.se>
> wrote:
>
>> Hi Kortesh,
>> rsync version 3.1.2 protocol version 31
>> All nodes run CentOS 7, updated the last couple of days.
>>
>> Thanks
>> Marcus
>>
>> ################
>> Marcus PedersÃ©n
>> Systemadministrator
>> Interbull Centre
>> ################
>> Sent from my phone
>> ################
>>
>>
>> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <
>> ***@redhat.com>:
>>
>> Hi Marcus,
>>
>> What's the rsync version being used?
>>
>> Thanks,
>> Kotresh HR
>>
>> On Thu, Aug 2, 2018 at 1:48 AM, Marcus PedersÃ©n <***@slu.se>
>> wrote:
>>
>> Hi all!
>>
>> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>>
>> With help from the list with some sym links and so on (handled in another
>> thread)
>>
>> I got the geo-replication running.
>>
>> It ran for 4-5 hours and then stopped, I stopped and started
>> geo-replication and it ran for another 4-5 hours.
>>
>> 4.1.2 was released and I updated, hoping this would solve the problem.
>>
>> I still have the same problem, at start it runs for 4-5 hours and then it
>> stops.
>>
>> After that nothing happens, I have waited for days but still
>> nothing happens.
>>
>>
>> I have looked through logs but can not find anything obvious.
>>
>>
>> Status for geo-replication is active for the two same nodes all the time:
>>
>>
>> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
>> SLAVE SLAVE NODE STATUS
>> CRAWL STATUS LAST_SYNCED ENTRY DATA META
>> FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT
>> COMPLETION TIME
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> -----------------------------------
>> urd-gds-001 urd-gds-volume /urd-gds/gluster geouser
>> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active
>> History Crawl 2018-04-16 20:32:09 0 14205 0
>> 0 2018-07-27 21:12:44 No
>> N/A
>> urd-gds-002 urd-gds-volume /urd-gds/gluster geouser
>> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive
>> N/A N/A N/A N/A N/A
>> N/A N/A N/A
>> N/A
>> urd-gds-004 urd-gds-volume /urd-gds/gluster geouser
>> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive
>> N/A N/A N/A N/A N/A
>> N/A N/A N/A
>> N/A
>> urd-gds-003 urd-gds-volume /urd-gds/gluster geouser
>> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active
>> History Crawl 2018-05-01 20:58:14 285 4552 0
>> 0 2018-07-27 21:12:44 No
>> N/A
>> urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser
>> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive
>> N/A N/A N/A N/A N/A
>> N/A N/A N/A
>> N/A
>> urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser
>> ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive
>> N/A N/A N/A N/A N/A
>> N/A N/A N/A N/A
>>
>>
>> Master cluster is Distribute-Replicate
>>
>> 2 x (2 + 1)
>>
>> Used space 30TB
>>
>>
>> Slave cluster is Replicate
>>
>> 1 x (2 + 1)
>>
>> Used space 9TB
>>
>>
>> Parts from gsyncd.logs are enclosed.
>>
>>
>> Thanks a lot!
>>
>>
>> Best regards
>>
>> Marcus PedersÃ©n
>>
>>
>>
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-***@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>>
>> ---
>> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
>> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina
> personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>

--
Thanks and Regards,
Kotresh H R

Marcus Pedersén

2018-08-02 08:04:09 UTC

Permalink

Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811

When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus

################
Marcus PedersÃ©n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus PedersÃ©n <***@slu.se<mailto:***@slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and gsyncd.py processes.

Regards
Marcus

################
Marcus PedersÃ©n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus PedersÃ©n <***@slu.se<mailto:***@slu.se>> wrote:
Hi Kortesh,
rsync version 3.1.2 protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus

################
Marcus PedersÃ©n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus PedersÃ©n <***@slu.se<mailto:***@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.

I have looked through logs but can not find anything obvious.

Status for geo-replication is active for the two same nodes all the time:

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A

Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB

Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB

Parts from gsyncd.logs are enclosed.

Thanks a lot!

Best regards

Marcus PedersÃ©n

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
NÃ€r du skickar e-post till SLU sÃ¥ innebÃ€r detta att SLU behandlar dina personuppgifter. FÃ¶r att lÃ€sa mer om hur detta gÃ¥r till, klicka hÃ€r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-08-06 11:28:51 UTC

Permalink

Hi,

Is there a way to resolve the problem with rsync and hanging processes?

Do I need to kill all the processes and hope that it starts again or stop/start geo-replication?

If I stop/start geo-replication it will start again, I have tried it before.

Regards

Marcus

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 2 augusti 2018 10:04
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811

When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and gsyncd.py processes.

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:
Hi Kortesh,
rsync version 3.1.2 protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.

I have looked through logs but can not find anything obvious.

Status for geo-replication is active for the two same nodes all the time:

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A

Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB

Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB

Parts from gsyncd.logs are enclosed.

Thanks a lot!

Best regards

Marcus Pedersén

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-08-12 20:18:32 UTC

Permalink

Hi,

As the geo-replication stopped after 4-5 hours, I added a cron job that stopped, paused for 2 mins and started geo-replication again every 6 hours.

The cron job has been running for 5 days and the changelogs has been catching up.

Now a different behavior has shown up.

In one of the active master nodes I get a python error.

The other active master node has started to toggle status between active and faulty.

See parts of logs below.

When I read Troubleshooting Geo-replication, there is a suggestion when sync is not complete, to enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication.

There is no explanation of how to erase the index.

Should I enforse a full sync?

How do I erase the index?

Thanks a lot!

Best regards

Marcus Pedersén

Node with python error:

[2018-08-12 16:02:05.304924] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-12 16:02:06.842832] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5376
[2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869
[2018-08-12 16:02:07.931536] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-12 16:02:20.760411] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740
[2018-08-12 16:02:20.831918] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-12 16:02:20.835541] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523906126, 0) entry_stime=None etime=1534089740
[2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523906126, 0)
[2018-08-12 16:02:21.950453] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in syncjob
po = self.sync_engine(pb, self.log_err)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in rsync
rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.16882] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.26951] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-12 16:02:34.642838] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6156
[2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...

Node that toggles status between active and faulty:

[2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2757 num_files=27 job=2 return_code=23
[2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.3767 num_files=67 job=1 return_code=23
[2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=14853:139697829693248:1534102389.64 method=entry_ops error=GsyncdError
[2018-08-12 19:33:09.927042] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 19:33:20.457559] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-12 19:33:20.511825] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-12 19:33:22.88713] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5766
[2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896
[2018-08-12 19:33:23.179805] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-12 19:33:35.246495] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534102415
[2018-08-12 19:33:35.321988] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-12 19:33:35.324270] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534102415
[2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
[2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
[2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0249 data_duration=0.2156 DATA=0 XATT=0
[2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
5290650, 0) duration=0.2455 num_changelogs=1 mode=history_changelog
[2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
[2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=3 job=2 return_code=23
[2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0807 num_files=4 job=2 return_code=23
[2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1098 num_files=11 job=1 return_code=23

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 6 augusti 2018 13:28
Till: ***@redhat.com
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi,

Is there a way to resolve the problem with rsync and hanging processes?

Do I need to kill all the processes and hope that it starts again or stop/start geo-replication?

If I stop/start geo-replication it will start again, I have tried it before.

Regards

Marcus

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 2 augusti 2018 10:04
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811

When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and gsyncd.py processes.

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:
Hi Kortesh,
rsync version 3.1.2 protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.

I have looked through logs but can not find anything obvious.

Status for geo-replication is active for the two same nodes all the time:

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A

Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB

Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB

Parts from gsyncd.logs are enclosed.

Thanks a lot!

Best regards

Marcus Pedersén

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-08-13 18:39:38 UTC

Permalink

Hi again,

New changes in behaviour, both master master nodes that are active toggles to failure and the logs repeat the same over and over again.

Part of log, node1:

[2018-08-13 18:24:44.701711] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-13 18:24:44.704360] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523907056, 0) entry_stime=None etime=1534184684
[2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523907056, 0)
[2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=5919:140339726538560:1534184685.88 method=entry_ops error=GsyncdError
[2018-08-13 18:24:45.905111] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-08-13 18:24:45.919265] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:24:56.655889] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-13 18:24:56.664628] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-13 18:24:58.347415] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6824
[2018-08-13 18:24:58.348151] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-13 18:24:59.463598] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1150
[2018-08-13 18:24:59.464184] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-13 18:25:01.549007] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-13 18:25:01.549606] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184701
[2018-08-13 18:25:01.593946] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active

Part of log, node2:

[2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:25:24.610362] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-13 18:25:24.621551] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-13 18:25:26.164855] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5431
[2018-08-13 18:25:26.165124] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-13 18:25:27.331969] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1667
[2018-08-13 18:25:27.335560] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-13 18:25:37.768867] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-13 18:25:37.769479] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184737
[2018-08-13 18:25:37.787317] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-13 18:25:37.789822] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534184737
[2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
[2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
[2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0228 data_duration=0.2456 DATA=0 XATT=0
[2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
5290650, 0) duration=0.2723 num_changelogs=1 mode=history_changelog
[2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
[2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1327 num_files=3 job=3 return_code=23
[2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1133 num_files=8 job=1 return_code=23
[2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.6315 num_files=47 job=2 return_code=23

...............

[2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0988 num_files=18 job=2 return_code=23
[2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2600 num_files=27 job=2 return_code=23
[2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.4090 num_files=67 job=2 return_code=23
[2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=18662:140079998809920:1534184774.58 method=entry_ops error=GsyncdError
[2018-08-13 18:26:14.866166] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty

Help would be appriciated!

Thanks!

Regards

Marcus Pedersén

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 12 augusti 2018 22:18
Till: ***@redhat.com
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi,

As the geo-replication stopped after 4-5 hours, I added a cron job that stopped, paused for 2 mins and started geo-replication again every 6 hours.

The cron job has been running for 5 days and the changelogs has been catching up.

Now a different behavior has shown up.

In one of the active master nodes I get a python error.

The other active master node has started to toggle status between active and faulty.

See parts of logs below.

When I read Troubleshooting Geo-replication, there is a suggestion when sync is not complete, to enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication.

There is no explanation of how to erase the index.

Should I enforse a full sync?

How do I erase the index?

Thanks a lot!

Best regards

Marcus Pedersén

Node with python error:

[2018-08-12 16:02:05.304924] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-12 16:02:06.842832] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5376
[2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869
[2018-08-12 16:02:07.931536] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-12 16:02:20.760411] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740
[2018-08-12 16:02:20.831918] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-12 16:02:20.835541] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523906126, 0) entry_stime=None etime=1534089740
[2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523906126, 0)
[2018-08-12 16:02:21.950453] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in syncjob
po = self.sync_engine(pb, self.log_err)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in rsync
rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.16882] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.26951] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-12 16:02:34.642838] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6156
[2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...

Node that toggles status between active and faulty:

[2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2757 num_files=27 job=2 return_code=23
[2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.3767 num_files=67 job=1 return_code=23
[2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=14853:139697829693248:1534102389.64 method=entry_ops error=GsyncdError
[2018-08-12 19:33:09.927042] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
[2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
[2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 19:33:20.457559] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-12 19:33:20.511825] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
[2018-08-12 19:33:22.88713] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5766
[2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
[2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896
[2018-08-12 19:33:23.179805] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
[2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-12 19:33:35.246495] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534102415
[2018-08-12 19:33:35.321988] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
[2018-08-12 19:33:35.324270] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
[2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534102415
[2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
[2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
[2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
[2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0249 data_duration=0.2156 DATA=0 XATT=0
[2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
5290650, 0) duration=0.2455 num_changelogs=1 mode=history_changelog
[2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
[2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=3 job=2 return_code=23
[2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0807 num_files=4 job=2 return_code=23
[2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1098 num_files=11 job=1 return_code=23

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 6 augusti 2018 13:28
Till: ***@redhat.com
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi,

Is there a way to resolve the problem with rsync and hanging processes?

Do I need to kill all the processes and hope that it starts again or stop/start geo-replication?

If I stop/start geo-replication it will start again, I have tried it before.

Regards

Marcus

________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 2 augusti 2018 10:04
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811

When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and gsyncd.py processes.

Regards
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:
Hi Kortesh,
rsync version 3.1.2 protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus

################
Marcus Pedersén
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com<mailto:***@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén <***@slu.se<mailto:***@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.

I have looked through logs but can not find anything obvious.

Status for geo-replication is active for the two same nodes all the time:

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A

Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB

Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB

Parts from gsyncd.logs are enclosed.

Thanks a lot!

Best regards

Marcus Pedersén

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org<mailto:Gluster-***@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

--
Thanks and Regards,
Kotresh H R

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Sunny Kumar

2018-08-13 19:49:51 UTC

Permalink

Hi Marcus,

Can you please share mount log from slave (You can find it at
"/var/log/glusterfs/geo-replication-slaves/<mastervol>hostname<slavevol>/mnt____.log").

- Sunny
On Tue, Aug 14, 2018 at 12:48 AM Marcus Pedersén <***@slu.se> wrote:
>
> Hi again,
>
> New changes in behaviour, both master master nodes that are active toggles to failure and the logs repeat the same over and over again.
>
>
> Part of log, node1:
>
> [2018-08-13 18:24:44.701711] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-13 18:24:44.704360] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523907056, 0) entry_stime=None etime=1534184684
> [2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523907056, 0)
> [2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=5919:140339726538560:1534184685.88 method=entry_ops error=GsyncdError
> [2018-08-13 18:24:45.905111] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:24:45.919265] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655889] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:24:56.664628] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-13 18:24:58.347415] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6824
> [2018-08-13 18:24:58.348151] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:24:59.463598] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1150
> [2018-08-13 18:24:59.464184] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-13 18:25:01.549007] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:01.549606] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184701
> [2018-08-13 18:25:01.593946] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
>
>
> Part of log, node2:
>
> [2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.610362] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:25:24.621551] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-13 18:25:26.164855] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5431
> [2018-08-13 18:25:26.165124] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:25:27.331969] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1667
> [2018-08-13 18:25:27.335560] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-13 18:25:37.768867] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:37.769479] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184737
> [2018-08-13 18:25:37.787317] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-13 18:25:37.789822] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534184737
> [2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
> [2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
> [2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0228 data_duration=0.2456 DATA=0 XATT=0
> [2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
> 5290650, 0) duration=0.2723 num_changelogs=1 mode=history_changelog
> [2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1327 num_files=3 job=3 return_code=23
> [2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1133 num_files=8 job=1 return_code=23
> [2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.6315 num_files=47 job=2 return_code=23
>
>
> ...............
>
>
> [2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0988 num_files=18 job=2 return_code=23
> [2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2600 num_files=27 job=2 return_code=23
> [2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.4090 num_files=67 job=2 return_code=23
> [2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=18662:140079998809920:1534184774.58 method=entry_ops error=GsyncdError
> [2018-08-13 18:26:14.866166] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
>
>
> Help would be appriciated!
>
> Thanks!
>
>
> Regards
>
> Marcus Pedersén
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 12 augusti 2018 22:18
> Till: ***@redhat.com
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> As the geo-replication stopped after 4-5 hours, I added a cron job that stopped, paused for 2 mins and started geo-replication again every 6 hours.
>
> The cron job has been running for 5 days and the changelogs has been catching up.
>
>
> Now a different behavior has shown up.
>
> In one of the active master nodes I get a python error.
>
> The other active master node has started to toggle status between active and faulty.
>
> See parts of logs below.
>
>
> When I read Troubleshooting Geo-replication, there is a suggestion when sync is not complete, to enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication.
>
> There is no explanation of how to erase the index.
>
> Should I enforse a full sync?
>
> How do I erase the index?
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
> Node with python error:
>
> [2018-08-12 16:02:05.304924] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 16:02:06.842832] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5376
> [2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869
> [2018-08-12 16:02:07.931536] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 16:02:20.760411] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740
> [2018-08-12 16:02:20.831918] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-12 16:02:20.835541] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523906126, 0) entry_stime=None etime=1534089740
> [2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523906126, 0)
> [2018-08-12 16:02:21.950453] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in twrap
> tf(*aargs)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in syncjob
> po = self.sync_engine(pb, self.log_err)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in rsync
> rconf.ssh_ctl_args + \
> AttributeError: 'NoneType' object has no attribute 'split'
> [2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.16882] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.26951] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 16:02:34.642838] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6156
> [2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
>
>
>
>
> Node that toggles status between active and faulty:
>
> [2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2757 num_files=27 job=2 return_code=23
> [2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.3767 num_files=67 job=1 return_code=23
> [2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=14853:139697829693248:1534102389.64 method=entry_ops error=GsyncdError
> [2018-08-12 19:33:09.927042] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.457559] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 19:33:20.511825] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 19:33:22.88713] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5766
> [2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896
> [2018-08-12 19:33:23.179805] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 19:33:35.246495] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534102415
> [2018-08-12 19:33:35.321988] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-12 19:33:35.324270] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534102415
> [2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
> [2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
> [2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0249 data_duration=0.2156 DATA=0 XATT=0
> [2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
> 5290650, 0) duration=0.2455 num_changelogs=1 mode=history_changelog
> [2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=3 job=2 return_code=23
> [2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0807 num_files=4 job=2 return_code=23
> [2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1098 num_files=11 job=1 return_code=23
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 6 augusti 2018 13:28
> Till: ***@redhat.com
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> Is there a way to resolve the problem with rsync and hanging processes?
>
> Do I need to kill all the processes and hope that it starts again or stop/start geo-replication?
>
>
> If I stop/start geo-replication it will start again, I have tried it before.
>
>
> Regards
>
> Marcus
>
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 2 augusti 2018 10:04
> Till: Kotresh Hiremath Ravishankar
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
> Hi Kotresh,
>
> I get the following and then it hangs:
>
> strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811
>
>
> When sync is running I can see rsync with geouser on the slave node.
>
> Regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>
> Cool, just check whether they are hung by any chance with following command.
>
> #strace -f -p 5921
>
> On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén <***@slu.se> wrote:
>
> On both active master nodes there is an rsync process. As in:
>
> root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd
>
> There is also ssh tunnels to slave nodes and gsyncd.py processes.
>
> Regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> Could you look of any rsync processes hung in master or slave?
>
> On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén <***@slu.se> wrote:
>
> Hi Kortesh,
> rsync version 3.1.2 protocol version 31
> All nodes run CentOS 7, updated the last couple of days.
>
> Thanks
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
>
> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>
> Hi Marcus,
>
> What's the rsync version being used?
>
> Thanks,
> Kotresh HR
>
> On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén <***@slu.se> wrote:
>
> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it stops.
>
> After that nothing happens, I have waited for days but still nothing happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
> urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Marcus Pedersén

2018-08-13 20:45:47 UTC

Permalink

Hi Sunny,
Please find the enclosed mount logs for the two active mater nodes.
I cut them down to todays logs.

Thanks!

Marcus

________________________________________
Från: Sunny Kumar <***@redhat.com>
Skickat: den 13 augusti 2018 21:49
Till: Marcus Pedersén
Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Marcus,

Can you please share mount log from slave (You can find it at
"/var/log/glusterfs/geo-replication-slaves/<mastervol>hostname<slavevol>/mnt____.log").

- Sunny
On Tue, Aug 14, 2018 at 12:48 AM Marcus Pedersén <***@slu.se> wrote:
>
> Hi again,
>
> New changes in behaviour, both master master nodes that are active toggles to failure and the logs repeat the same over and over again.
>
>
> Part of log, node1:
>
> [2018-08-13 18:24:44.701711] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-13 18:24:44.704360] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523907056, 0) entry_stime=None etime=1534184684
> [2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523907056, 0)
> [2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=5919:140339726538560:1534184685.88 method=entry_ops error=GsyncdError
> [2018-08-13 18:24:45.905111] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:24:45.919265] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655889] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:24:56.664628] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-13 18:24:58.347415] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6824
> [2018-08-13 18:24:58.348151] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:24:59.463598] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1150
> [2018-08-13 18:24:59.464184] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-13 18:25:01.549007] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:01.549606] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184701
> [2018-08-13 18:25:01.593946] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
>
>
> Part of log, node2:
>
> [2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.610362] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:25:24.621551] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-13 18:25:26.164855] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5431
> [2018-08-13 18:25:26.165124] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:25:27.331969] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1667
> [2018-08-13 18:25:27.335560] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-13 18:25:37.768867] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:37.769479] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184737
> [2018-08-13 18:25:37.787317] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-13 18:25:37.789822] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534184737
> [2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
> [2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
> [2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0228 data_duration=0.2456 DATA=0 XATT=0
> [2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
> 5290650, 0) duration=0.2723 num_changelogs=1 mode=history_changelog
> [2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1327 num_files=3 job=3 return_code=23
> [2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1133 num_files=8 job=1 return_code=23
> [2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.6315 num_files=47 job=2 return_code=23
>
>
> ...............
>
>
> [2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0988 num_files=18 job=2 return_code=23
> [2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2600 num_files=27 job=2 return_code=23
> [2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.4090 num_files=67 job=2 return_code=23
> [2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=18662:140079998809920:1534184774.58 method=entry_ops error=GsyncdError
> [2018-08-13 18:26:14.866166] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
>
>
> Help would be appriciated!
>
> Thanks!
>
>
> Regards
>
> Marcus Pedersén
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 12 augusti 2018 22:18
> Till: ***@redhat.com
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> As the geo-replication stopped after 4-5 hours, I added a cron job that stopped, paused for 2 mins and started geo-replication again every 6 hours.
>
> The cron job has been running for 5 days and the changelogs has been catching up.
>
>
> Now a different behavior has shown up.
>
> In one of the active master nodes I get a python error.
>
> The other active master node has started to toggle status between active and faulty.
>
> See parts of logs below.
>
>
> When I read Troubleshooting Geo-replication, there is a suggestion when sync is not complete, to enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication.
>
> There is no explanation of how to erase the index.
>
> Should I enforse a full sync?
>
> How do I erase the index?
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
> Node with python error:
>
> [2018-08-12 16:02:05.304924] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 16:02:06.842832] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5376
> [2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869
> [2018-08-12 16:02:07.931536] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 16:02:20.760411] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740
> [2018-08-12 16:02:20.831918] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-12 16:02:20.835541] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523906126, 0) entry_stime=None etime=1534089740
> [2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523906126, 0)
> [2018-08-12 16:02:21.950453] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in twrap
> tf(*aargs)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in syncjob
> po = self.sync_engine(pb, self.log_err)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in rsync
> rconf.ssh_ctl_args + \
> AttributeError: 'NoneType' object has no attribute 'split'
> [2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.16882] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.26951] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 16:02:34.642838] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6156
> [2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
>
>
>
>
> Node that toggles status between active and faulty:
>
> [2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2757 num_files=27 job=2 return_code=23
> [2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.3767 num_files=67 job=1 return_code=23
> [2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=14853:139697829693248:1534102389.64 method=entry_ops error=GsyncdError
> [2018-08-12 19:33:09.927042] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.457559] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 19:33:20.511825] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 19:33:22.88713] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5766
> [2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896
> [2018-08-12 19:33:23.179805] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 19:33:35.246495] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534102415
> [2018-08-12 19:33:35.321988] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-12 19:33:35.324270] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534102415
> [2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
> [2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
> [2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0249 data_duration=0.2156 DATA=0 XATT=0
> [2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
> 5290650, 0) duration=0.2455 num_changelogs=1 mode=history_changelog
> [2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=3 job=2 return_code=23
> [2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0807 num_files=4 job=2 return_code=23
> [2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1098 num_files=11 job=1 return_code=23
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 6 augusti 2018 13:28
> Till: ***@redhat.com
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> Is there a way to resolve the problem with rsync and hanging processes?
>
> Do I need to kill all the processes and hope that it starts again or stop/start geo-replication?
>
>
> If I stop/start geo-replication it will start again, I have tried it before.
>
>
> Regards
>
> Marcus
>
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 2 augusti 2018 10:04
> Till: Kotresh Hiremath Ravishankar
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
> Hi Kotresh,
>
> I get the following and then it hangs:
>
> strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811
>
>
> When sync is running I can see rsync with geouser on the slave node.
>
> Regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>
> Cool, just check whether they are hung by any chance with following command.
>
> #strace -f -p 5921
>
> On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén <***@slu.se> wrote:
>
> On both active master nodes there is an rsync process. As in:
>
> root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd
>
> There is also ssh tunnels to slave nodes and gsyncd.py processes.
>
> Regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> Could you look of any rsync processes hung in master or slave?
>
> On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén <***@slu.se> wrote:
>
> Hi Kortesh,
> rsync version 3.1.2 protocol version 31
> All nodes run CentOS 7, updated the last couple of days.
>
> Thanks
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
>
> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>
> Hi Marcus,
>
> What's the rsync version being used?
>
> Thanks,
> Kotresh HR
>
> On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén <***@slu.se> wrote:
>
> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it stops.
>
> After that nothing happens, I have waited for days but still nothing happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
> urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2018-08-15 18:47:39 UTC

Permalink

Hi again Sunny,
Just a bit curious if you find anything in the logs that is useful and can help me get the geo-replication running.

Many thanks in advance!

Regards
Marcus
________________________________________
Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
Skickat: den 13 augusti 2018 22:45
Till: Sunny Kumar
Kopia: gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Sunny,
Please find the enclosed mount logs for the two active mater nodes.
I cut them down to todays logs.

Thanks!

Marcus

________________________________________
Från: Sunny Kumar <***@redhat.com>
Skickat: den 13 augusti 2018 21:49
Till: Marcus Pedersén
Kopia: Kotresh Hiremath Ravishankar; gluster-***@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Marcus,

Can you please share mount log from slave (You can find it at
"/var/log/glusterfs/geo-replication-slaves/<mastervol>hostname<slavevol>/mnt____.log").

- Sunny
On Tue, Aug 14, 2018 at 12:48 AM Marcus Pedersén <***@slu.se> wrote:
>
> Hi again,
>
> New changes in behaviour, both master master nodes that are active toggles to failure and the logs repeat the same over and over again.
>
>
> Part of log, node1:
>
> [2018-08-13 18:24:44.701711] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-13 18:24:44.704360] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523907056, 0) entry_stime=None etime=1534184684
> [2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523907056, 0)
> [2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=5919:140339726538560:1534184685.88 method=entry_ops error=GsyncdError
> [2018-08-13 18:24:45.905111] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:24:45.919265] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655889] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:24:56.664628] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-13 18:24:58.347415] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6824
> [2018-08-13 18:24:58.348151] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:24:59.463598] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1150
> [2018-08-13 18:24:59.464184] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-13 18:25:01.549007] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:01.549606] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184701
> [2018-08-13 18:25:01.593946] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
>
>
> Part of log, node2:
>
> [2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.610362] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:25:24.621551] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-13 18:25:26.164855] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5431
> [2018-08-13 18:25:26.165124] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:25:27.331969] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1667
> [2018-08-13 18:25:27.335560] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-13 18:25:37.768867] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:37.769479] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184737
> [2018-08-13 18:25:37.787317] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-13 18:25:37.789822] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534184737
> [2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
> [2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
> [2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0228 data_duration=0.2456 DATA=0 XATT=0
> [2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
> 5290650, 0) duration=0.2723 num_changelogs=1 mode=history_changelog
> [2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1327 num_files=3 job=3 return_code=23
> [2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1133 num_files=8 job=1 return_code=23
> [2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.6315 num_files=47 job=2 return_code=23
>
>
> ...............
>
>
> [2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0988 num_files=18 job=2 return_code=23
> [2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2600 num_files=27 job=2 return_code=23
> [2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.4090 num_files=67 job=2 return_code=23
> [2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=18662:140079998809920:1534184774.58 method=entry_ops error=GsyncdError
> [2018-08-13 18:26:14.866166] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
>
>
> Help would be appriciated!
>
> Thanks!
>
>
> Regards
>
> Marcus Pedersén
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 12 augusti 2018 22:18
> Till: ***@redhat.com
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> As the geo-replication stopped after 4-5 hours, I added a cron job that stopped, paused for 2 mins and started geo-replication again every 6 hours.
>
> The cron job has been running for 5 days and the changelogs has been catching up.
>
>
> Now a different behavior has shown up.
>
> In one of the active master nodes I get a python error.
>
> The other active master node has started to toggle status between active and faulty.
>
> See parts of logs below.
>
>
> When I read Troubleshooting Geo-replication, there is a suggestion when sync is not complete, to enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication.
>
> There is no explanation of how to erase the index.
>
> Should I enforse a full sync?
>
> How do I erase the index?
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
> Node with python error:
>
> [2018-08-12 16:02:05.304924] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 16:02:06.842832] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5376
> [2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869
> [2018-08-12 16:02:07.931536] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 16:02:20.760411] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740
> [2018-08-12 16:02:20.831918] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-12 16:02:20.835541] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523906126, 0) entry_stime=None etime=1534089740
> [2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523906126, 0)
> [2018-08-12 16:02:21.950453] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in twrap
> tf(*aargs)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in syncjob
> po = self.sync_engine(pb, self.log_err)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in rsync
> rconf.ssh_ctl_args + \
> AttributeError: 'NoneType' object has no attribute 'split'
> [2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.16882] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.26951] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 16:02:34.642838] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6156
> [2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
>
>
>
>
> Node that toggles status between active and faulty:
>
> [2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2757 num_files=27 job=2 return_code=23
> [2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.3767 num_files=67 job=1 return_code=23
> [2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=14853:139697829693248:1534102389.64 method=entry_ops error=GsyncdError
> [2018-08-12 19:33:09.927042] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster
> [2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
> [2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000
> [2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.457559] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 19:33:20.511825] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave...
> [2018-08-12 19:33:22.88713] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5766
> [2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896
> [2018-08-12 19:33:23.179805] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor
> [2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 19:33:35.246495] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534102415
> [2018-08-12 19:33:35.321988] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active
> [2018-08-12 19:33:35.324270] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl
> [2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534102415
> [2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651
> [2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0
> [2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0249 data_duration=0.2156 DATA=0 XATT=0
> [2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\
> 5290650, 0) duration=0.2455 num_changelogs=1 mode=history_changelog
> [2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0)
> [2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=3 job=2 return_code=23
> [2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0807 num_files=4 job=2 return_code=23
> [2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1098 num_files=11 job=1 return_code=23
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 6 augusti 2018 13:28
> Till: ***@redhat.com
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> Is there a way to resolve the problem with rsync and hanging processes?
>
> Do I need to kill all the processes and hope that it starts again or stop/start geo-replication?
>
>
> If I stop/start geo-replication it will start again, I have tried it before.
>
>
> Regards
>
> Marcus
>
>
>
> ________________________________
> Från: gluster-users-***@gluster.org <gluster-users-***@gluster.org> för Marcus Pedersén <***@slu.se>
> Skickat: den 2 augusti 2018 10:04
> Till: Kotresh Hiremath Ravishankar
> Kopia: gluster-***@gluster.org
> Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
> Hi Kotresh,
>
> I get the following and then it hangs:
>
> strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811
>
>
> When sync is running I can see rsync with geouser on the slave node.
>
> Regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>
> Cool, just check whether they are hung by any chance with following command.
>
> #strace -f -p 5921
>
> On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén <***@slu.se> wrote:
>
> On both active master nodes there is an rsync process. As in:
>
> root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock ***@urd-gds-geo-001:/proc/13077/cwd
>
> There is also ssh tunnels to slave nodes and gsyncd.py processes.
>
> Regards
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
> Could you look of any rsync processes hung in master or slave?
>
> On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén <***@slu.se> wrote:
>
> Hi Kortesh,
> rsync version 3.1.2 protocol version 31
> All nodes run CentOS 7, updated the last couple of days.
>
> Thanks
> Marcus
>
> ################
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
>
> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <***@redhat.com>:
>
> Hi Marcus,
>
> What's the rsync version being used?
>
> Thanks,
> Kotresh HR
>
> On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén <***@slu.se> wrote:
>
> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it stops.
>
> After that nothing happens, I have waited for days but still nothing happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> urd-gds-001 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A
> urd-gds-002 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-004 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-003 urd-gds-volume /urd-gds/gluster geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
> urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser ***@urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-***@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Continue reading on narkive:

Search results for '[Gluster-users] Geo-replication stops after 4-5 hours' (Questions and Answers)

replies

Evolution created God?

started 2006-06-24 06:28:37 UTC

religion & spirituality