Discussion:
[Gluster-users] Failures during rebalance on gluster distributed disperse volume
Mauro Tridici
2018-09-12 13:54:30 UTC
Permalink
Dear All,

I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected.

[***@s01 glusterfs]# gluster v rebalance tier2 status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 71176 3.2MB 2137557 1530391 8128 in progress 13:59:05
s02-stg 0 0Bytes 0 0 0 completed 11:53:28
s03-stg 0 0Bytes 0 0 0 completed 11:53:32
s04-stg 0 0Bytes 0 0 0 completed 0:00:06
s05-stg 15 0Bytes 17055 0 18 completed 10:48:01
s06-stg 0 0Bytes 0 0 0 completed 0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success

In the volume rebalance log file, I detected a lot of error messages similar to the following ones:

[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
[2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc

Could you please help me to understand what is happening and how to solve it?

Our Gluster implementation is based on Gluster v.3.10.5

Thank you in advance,
Mauro
Nithya Balachandran
2018-09-13 11:38:54 UTC
Permalink
This looks like an issue because rebalance switched to using fallocate
which EC did not have implemented at that point.

@Pranith, @Ashish, which version of gluster had support for fallocate in EC?


Regards,
Nithya
Post by Mauro Tridici
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing
Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed
the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but, as
you can see in the output below, I noticed that some failures have been
detected.
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 71176 3.2MB
2137557 1530391 8128 in progress 13:59:05
s02-stg 0 0Bytes
0 0 0 completed 11:53:28
s03-stg 0 0Bytes
0 0 0 completed 11:53:32
s04-stg 0 0Bytes
0 0 0 completed 0:00:06
s05-stg 15 0Bytes
17055 0 18 completed 10:48:01
s06-stg 0 0Bytes
0 0 0 completed 0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
In the volume rebalance log file, I detected a lot of error messages
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-6 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s
ps_200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file]
0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_
200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file]
0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_
200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on
tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-9 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s
ps_200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file]
0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_
200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file]
0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_
200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on
tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-10 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s
ps_200508_003.cam.h0.2006-01_grid.nc
[2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file]
0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_
200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
Mauro Tridici
2018-09-13 13:04:55 UTC
Permalink
Hi Nithya,

thank you for involving EC group.
I will wait for your suggestions.

Regards,
Mauro
This looks like an issue because rebalance switched to using fallocate which EC did not have implemented at that point.
@Pranith, @Ashish, which version of gluster had support for fallocate in EC?
Regards,
Nithya
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 71176 3.2MB 2137557 1530391 8128 in progress 13:59:05
s02-stg 0 0Bytes 0 0 0 completed 11:53:28
s03-stg 0 0Bytes 0 0 0 completed 11:53:32
s04-stg 0 0Bytes 0 0 0 completed 0:00:06
s05-stg 15 0Bytes 17055 0 18 completed 10:48:01
s06-stg 0 0Bytes 0 0 0 completed 0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
[2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
Nithya Balachandran
2018-09-14 14:59:40 UTC
Permalink
Hi Mauro,


The rebalance code started using fallocate in 3.10.5 (
https://bugzilla.redhat.com/show_bug.cgi?id=1473132) which works fine on
replicated volumes. However, we neglected to test this with EC volumes on
3.10. Once we discovered the issue, the EC fallocate implementation was
made available in 3.11.

At this point, I'm afraid the only option I see is to upgrade to at least
3.12.

@Sunil, do you have anything to add?

Regards,
Nithya
Post by Mauro Tridici
Hi Nithya,
thank you for involving EC group.
I will wait for your suggestions.
Regards,
Mauro
Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran <
This looks like an issue because rebalance switched to using fallocate
which EC did not have implemented at that point.
@Pranith, @Ashish, which version of gluster had support for fallocate in EC?
Regards,
Nithya
Post by Mauro Tridici
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing
Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed
the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but,
as you can see in the output below, I noticed that some failures have been
detected.
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 71176 3.2MB
2137557 1530391 8128 in progress 13:59:05
s02-stg 0 0Bytes
0 0 0 completed 11:53:28
s03-stg 0 0Bytes
0 0 0 completed 11:53:32
s04-stg 0 0Bytes
0 0 0 completed 0:00:06
s05-stg 15 0Bytes
17055 0 18 completed 10:48:01
s06-stg 0 0Bytes
0 0 0 completed 0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
In the volume rebalance log file, I detected a lot of error messages
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-6 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_
200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.757025] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.759183] E [MSGID: 109023]
fallocate failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on
tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-9 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_
200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.759536] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.777219] E [MSGID: 109023]
fallocate failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on
tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-10 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_
200508_003.cam.h0.2006-01_grid.nc
[2018-09-12 13:15:50.777676] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
Mauro Tridici
2018-09-14 22:12:12 UTC
Permalink
Hi Nithya,

thank you very much for your answer.
I will wait for @Sunil opinion too before starting the upgrade procedure.

Since it will be the first upgrade of our Gluster cluster, I would like to know if it could be a “virtually dangerous" procedure and if it will be the risk of losing data :-)
Unfortunately, I can’t do a preventive copy of the volume data in another location.
If it is possible, could you please illustrate the right steps needed to complete the upgrade procedure from the 3.10.5 to the 3.12 version?

Thank you again, Nithya.
Thank you to all of you for the help!

Regards,
Mauro
Post by Nithya Balachandran
Hi Mauro,
The rebalance code started using fallocate in 3.10.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1473132 <https://bugzilla.redhat.com/show_bug.cgi?id=1473132>) which works fine on replicated volumes. However, we neglected to test this with EC volumes on 3.10. Once we discovered the issue, the EC fallocate implementation was made available in 3.11.
At this point, I'm afraid the only option I see is to upgrade to at least 3.12.
@Sunil, do you have anything to add?
Regards,
Nithya
Hi Nithya,
thank you for involving EC group.
I will wait for your suggestions.
Regards,
Mauro
This looks like an issue because rebalance switched to using fallocate which EC did not have implemented at that point.
@Pranith, @Ashish, which version of gluster had support for fallocate in EC?
Regards,
Nithya
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 71176 3.2MB 2137557 1530391 8128 in progress 13:59:05
s02-stg 0 0Bytes 0 0 0 completed 11:53:28
s03-stg 0 0Bytes 0 0 0 completed 11:53:32
s04-stg 0 0Bytes 0 0 0 completed 0:00:06
s05-stg 15 0Bytes 17055 0 18 completed 10:48:01
s06-stg 0 0Bytes 0 0 0 completed 0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
[2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
Sunil Kumar Heggodu Gopala Acharya
2018-09-15 09:57:41 UTC
Permalink
Hi Mauro,

As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as part
of https://bugzilla.redhat.com/show_bug.cgi?id=1454686. Hence, upgrading to
3.12 as suggested before would be a right move.

Here is the documentation for upgrading to 3.12:
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/

Regards,

Sunil kumar Acharya

Senior Software Engineer

Red Hat

<https://www.redhat.com>

T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/>

<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
Post by Mauro Tridici
Hi Nithya,
thank you very much for your answer.
Since it will be the first upgrade of our Gluster cluster, I would like to
know if it could be a “virtually dangerous" procedure and if it will be the
risk of losing data :-)
Unfortunately, I can’t do a preventive copy of the volume data in another location.
If it is possible, could you please illustrate the right steps needed to
complete the upgrade procedure from the 3.10.5 to the 3.12 version?
Thank you again, Nithya.
Thank you to all of you for the help!
Regards,
Mauro
Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran <
Hi Mauro,
The rebalance code started using fallocate in 3.10.5 (
https://bugzilla.redhat.com/show_bug.cgi?id=1473132) which works fine on
replicated volumes. However, we neglected to test this with EC volumes on
3.10. Once we discovered the issue, the EC fallocate implementation was
made available in 3.11.
At this point, I'm afraid the only option I see is to upgrade to at least 3.12.
@Sunil, do you have anything to add?
Regards,
Nithya
Post by Mauro Tridici
Hi Nithya,
thank you for involving EC group.
I will wait for your suggestions.
Regards,
Mauro
Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran <
This looks like an issue because rebalance switched to using fallocate
which EC did not have implemented at that point.
@Pranith, @Ashish, which version of gluster had support for fallocate in EC?
Regards,
Nithya
Post by Mauro Tridici
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing
Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed
the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but,
as you can see in the output below, I noticed that some failures have been
detected.
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 71176 3.2MB
2137557 1530391 8128 in progress
13:59:05
s02-stg 0 0Bytes
0 0 0 completed
11:53:28
s03-stg 0 0Bytes
0 0 0 completed
11:53:32
s04-stg 0 0Bytes
0 0 0 completed
0:00:06
s05-stg 15 0Bytes
17055 0 18 completed
10:48:01
s06-stg 0 0Bytes
0 0 0 completed
0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
In the volume rebalance log file, I detected a lot of error messages
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-6 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
00508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.757025] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.759183] E [MSGID: 109023]
fallocate failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on
tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-9 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
00508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.759536] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.777219] E [MSGID: 109023]
fallocate failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on
tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-10 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
00508_003.cam.h0.2006-01_grid.nc
[2018-09-12 13:15:50.777676] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
Mauro Tridici
2018-09-15 17:25:50 UTC
Permalink
Hi Sunil,

many thanks to you too.
I will follow your suggestions and the guide for upgrading to 3.12

Crossing fingers :-)
Regards,
Mauro
Post by Nithya Balachandran
Hi Mauro,
As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as part of https://bugzilla.redhat.com/show_bug.cgi?id=1454686 <https://bugzilla.redhat.com/show_bug.cgi?id=1454686>. Hence, upgrading to 3.12 as suggested before would be a right move.
Here is the documentation for upgrading to 3.12: https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/ <https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/>
Regards,
SUNIL KUMAR ACHARYA
SENIOR SOFTWARE ENGINEER
Red Hat
<https://www.redhat.com/>
T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/>
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
Hi Nithya,
thank you very much for your answer.
Since it will be the first upgrade of our Gluster cluster, I would like to know if it could be a “virtually dangerous" procedure and if it will be the risk of losing data :-)
Unfortunately, I can’t do a preventive copy of the volume data in another location.
If it is possible, could you please illustrate the right steps needed to complete the upgrade procedure from the 3.10.5 to the 3.12 version?
Thank you again, Nithya.
Thank you to all of you for the help!
Regards,
Mauro
Post by Nithya Balachandran
Hi Mauro,
The rebalance code started using fallocate in 3.10.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1473132 <https://bugzilla.redhat.com/show_bug.cgi?id=1473132>) which works fine on replicated volumes. However, we neglected to test this with EC volumes on 3.10. Once we discovered the issue, the EC fallocate implementation was made available in 3.11.
At this point, I'm afraid the only option I see is to upgrade to at least 3.12.
@Sunil, do you have anything to add?
Regards,
Nithya
Hi Nithya,
thank you for involving EC group.
I will wait for your suggestions.
Regards,
Mauro
This looks like an issue because rebalance switched to using fallocate which EC did not have implemented at that point.
@Pranith, @Ashish, which version of gluster had support for fallocate in EC?
Regards,
Nithya
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 71176 3.2MB 2137557 1530391 8128 in progress 13:59:05
s02-stg 0 0Bytes 0 0 0 completed 11:53:28
s03-stg 0 0Bytes 0 0 0 completed 11:53:32
s04-stg 0 0Bytes 0 0 0 completed 0:00:06
s05-stg 15 0Bytes 17055 0 18 completed 10:48:01
s06-stg 0 0Bytes 0 0 0 completed 0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
[2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it
Nithya Balachandran
2018-09-16 04:07:13 UTC
Permalink
Hi Mauro,

Please stop the rebalance before you upgrade.
Thanks,
Nithya
Post by Mauro Tridici
Hi Sunil,
many thanks to you too.
I will follow your suggestions and the guide for upgrading to 3.12
Crossing fingers :-)
Regards,
Mauro
Il giorno 15 set 2018, alle ore 11:57, Sunil Kumar Heggodu Gopala Acharya <
Hi Mauro,
As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as
part of https://bugzilla.redhat.com/show_bug.cgi?id=1454686. Hence,
upgrading to 3.12 as suggested before would be a right move.
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/
Regards,
Sunil kumar Acharya
Senior Software Engineer
Red Hat
<https://www.redhat.com/>
T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/>
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
Post by Mauro Tridici
Hi Nithya,
thank you very much for your answer.
Since it will be the first upgrade of our Gluster cluster, I would like
to know if it could be a “virtually dangerous" procedure and if it will be
the risk of losing data :-)
Unfortunately, I can’t do a preventive copy of the volume data in another location.
If it is possible, could you please illustrate the right steps needed to
complete the upgrade procedure from the 3.10.5 to the 3.12 version?
Thank you again, Nithya.
Thank you to all of you for the help!
Regards,
Mauro
Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran <
Hi Mauro,
The rebalance code started using fallocate in 3.10.5 (
https://bugzilla.redhat.com/show_bug.cgi?id=1473132) which works fine on
replicated volumes. However, we neglected to test this with EC volumes on
3.10. Once we discovered the issue, the EC fallocate implementation was
made available in 3.11.
At this point, I'm afraid the only option I see is to upgrade to at least 3.12.
@Sunil, do you have anything to add?
Regards,
Nithya
Post by Mauro Tridici
Hi Nithya,
thank you for involving EC group.
I will wait for your suggestions.
Regards,
Mauro
Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran <
This looks like an issue because rebalance switched to using fallocate
which EC did not have implemented at that point.
@Pranith, @Ashish, which version of gluster had support for fallocate in EC?
Regards,
Nithya
Post by Mauro Tridici
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing
Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already
executed the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but,
as you can see in the output below, I noticed that some failures have been
detected.
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 71176 3.2MB
2137557 1530391 8128 in progress
13:59:05
s02-stg 0 0Bytes
0 0 0 completed
11:53:28
s03-stg 0 0Bytes
0 0 0 completed
11:53:32
s04-stg 0 0Bytes
0 0 0 completed
0:00:06
s05-stg 15 0Bytes
17055 0 18 completed
10:48:01
s06-stg 0 0Bytes
0 0 0 completed
0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
In the volume rebalance log file, I detected a lot of error messages
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-6 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
00508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.757025] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
[2018-09-12 13:15:50.759183] E [MSGID: 109023]
fallocate failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on
tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-9 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
00508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.759536] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
[2018-09-12 13:15:50.777219] E [MSGID: 109023]
fallocate failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on
tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
0-tier2-dht: Create dst failed on - tier2-disperse-10 for file -
/CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
00508_003.cam.h0.2006-01_grid.nc
[2018-09-12 13:15:50.777676] E [MSGID: 109023]
migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
Mauro Tridici
2018-09-17 08:40:10 UTC
Permalink
Hi Nithya,

thank you.
Rabalance terminated (with a lot of failures) a few days ago.
Due to some I/O errors during write operations on gluster volume, I had to re-launch fix-layout rebalance

Now, fix-layouts rebalance is completed and I can write data without I/O errors.
As soon as possible I will start the upgrade procedure.

Thank you again for your support.
Regards,
Mauro
Post by Nithya Balachandran
Hi Mauro,
Please stop the rebalance before you upgrade.
Thanks,
Nithya
Hi Sunil,
many thanks to you too.
I will follow your suggestions and the guide for upgrading to 3.12
Crossing fingers :-)
Regards,
Mauro
Post by Nithya Balachandran
Hi Mauro,
As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as part of https://bugzilla.redhat.com/show_bug.cgi?id=1454686 <https://bugzilla.redhat.com/show_bug.cgi?id=1454686>. Hence, upgrading to 3.12 as suggested before would be a right move.
Here is the documentation for upgrading to 3.12: https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/ <https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/>
Regards,
SUNIL KUMAR ACHARYA
SENIOR SOFTWARE ENGINEER
Red Hat
<https://www.redhat.com/>
T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/>
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
Hi Nithya,
thank you very much for your answer.
Since it will be the first upgrade of our Gluster cluster, I would like to know if it could be a “virtually dangerous" procedure and if it will be the risk of losing data :-)
Unfortunately, I can’t do a preventive copy of the volume data in another location.
If it is possible, could you please illustrate the right steps needed to complete the upgrade procedure from the 3.10.5 to the 3.12 version?
Thank you again, Nithya.
Thank you to all of you for the help!
Regards,
Mauro
Post by Nithya Balachandran
Hi Mauro,
The rebalance code started using fallocate in 3.10.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1473132 <https://bugzilla.redhat.com/show_bug.cgi?id=1473132>) which works fine on replicated volumes. However, we neglected to test this with EC volumes on 3.10. Once we discovered the issue, the EC fallocate implementation was made available in 3.11.
At this point, I'm afraid the only option I see is to upgrade to at least 3.12.
@Sunil, do you have anything to add?
Regards,
Nithya
Hi Nithya,
thank you for involving EC group.
I will wait for your suggestions.
Regards,
Mauro
This looks like an issue because rebalance switched to using fallocate which EC did not have implemented at that point.
@Pranith, @Ashish, which version of gluster had support for fallocate in EC?
Regards,
Nithya
Dear All,
I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume.
Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem.
I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected.
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 71176 3.2MB 2137557 1530391 8128 in progress 13:59:05
s02-stg 0 0Bytes 0 0 0 completed 11:53:28
s03-stg 0 0Bytes 0 0 0 completed 11:53:32
s04-stg 0 0Bytes 0 0 0 completed 0:00:06
s05-stg 15 0Bytes 17055 0 18 completed 10:48:01
s06-stg 0 0Bytes 0 0 0 completed 0:00:06
Estimated time left for rebalance to complete : 0:46:53
volume rebalance: tier2: success
[2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
[2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 (Operation not supported)
[2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
[2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 (Operation not supported)
[2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
[2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
Could you please help me to understand what is happening and how to solve it?
Our Gluster implementation is based on Gluster v.3.10.5
Thank you in advance,
Mauro
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it

Loading...