[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch

Discussion:

Mauro Tridici

2018-07-02 08:51:24 UTC

Dear Users,

I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004â folder, the amount of used disk reported by quota command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.

How can I solve this issue? Is it possible to do it without starting a downtime period?

Thank you very much in advance,
Mauro

[***@s01 ~]# glusterfs -V
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

[***@s01 ~]# gluster volume quota tier2 list /CSP/ans004
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes

[***@s01 ~]# du -hs /tier2/CSP/ans004/
295G /tier2/CSP/ans004/

Sanoj Unnikrishnan

2018-07-03 09:34:04 UTC

Permalink

Hi Mauro,

This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by
running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py

Try running the script and revert with the logs generated.

Thanks,
Sanoj

Post by Mauro Tridici
Dear Users,
I just noticed that, after some data deletions executed inside
"/tier2/CSP/ans004â folder, the amount of used disk reported by quota
command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of
Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used
Available Soft-limit exceeded? Hard-limit exceeded?
------------------------------------------------------------
-------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB)
*3.9TB* 0Bytes Yes Yes
*295G* /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

Mauro Tridici

2018-07-03 10:39:11 UTC

Permalink

Dear Sanoj,

thank you very much for your support.
I just downloaded and executed the script you suggested.

This is the full command I executed:

./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster

In attachment, you can find the logs generated by the script.
What can I do now?

Thank you very much for your patience.
Mauro

Post by Sanoj Unnikrishnan
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py>
Try running the script and revert with the logs generated.
Thanks,
Sanoj
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004â folder, the amount of used disk reported by quota command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git <>
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/ <https://www.gluster.org/>>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>

-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: ***@cmcc.it

Sanoj Unnikrishnan

2018-07-05 07:08:35 UTC

Permalink

Hi Mauro,

A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex

Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.

Please capture the xattr values first, so that we can get to know what went
wrong.
Regards,
Sanoj

Post by Mauro Tridici
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Il giorno 03 lug 2018, alle ore 11:34, Sanoj Unnikrishnan <
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs
by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
Try running the script and revert with the logs generated.
Thanks,
Sanoj

Post by Mauro Tridici
Dear Users,
I just noticed that, after some data deletions executed inside
"/tier2/CSP/ans004â folder, the amount of used disk reported by quota
command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of
Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit
Used Available Soft-limit exceeded? Hard-limit exceeded?
------------------------------------------------------------
-------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB)
*3.9TB* 0Bytes Yes Yes
*295G* /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

Mauro Tridici

2018-07-05 07:56:40 UTC

Permalink

Hi Sanoj,

unfortunately the output of the command execution was not helpful.

[***@s01 ~]# find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
[***@s01 ~]#

Do you have some other idea in order to detect the cause of the issue?

Thank you again,
Mauro

Post by Sanoj Unnikrishnan
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro

Mauro Tridici

2018-07-09 15:31:20 UTC

Permalink

Hi Sanoj,

could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your âbackupâ suggestion.

Do you think that I should contact gluster developers too?

Thank you very much.
Regards,
Mauro

Post by Mauro Tridici
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro

Hari Gowtham

2018-07-10 09:02:38 UTC

Permalink

Hi,

There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.

Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.

Post by Mauro Tridici
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your “backup” suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj

Post by Mauro Tridici
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
Try running the script and revert with the logs generated.
Thanks,
Sanoj

Post by Mauro Tridici
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004” folder, the amount of used disk reported by quota command doesn’t reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-07-10 09:24:12 UTC

Permalink

Hi Hari,

thank you very much for your answer.
I will try to use the script mentioned above pointing to each backend bricks.

So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?

You can find below the âdfâ command output executed on a cluster node:

/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9

I will execute the following command and I will put here the output.

./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}

Thank you again for your support.
Regards,
Mauro

Post by Hari Gowtham
Hi,
There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.
Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.

Post by Mauro Tridici
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your âbackupâ suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj

Post by Mauro Tridici
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
Try running the script and revert with the logs generated.
Thanks,
Sanoj

Post by Mauro Tridici
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004â folder, the amount of used disk reported by quota command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Regards,
Hari Gowtham.

Hari Gowtham

2018-07-10 10:12:54 UTC

Permalink

Hi Mauro,

Can you send the gluster v status command output?

Is it a single volume that is spread between all the 3 node and has 36 bricks?
If yes, you will have to run on all the bricks.

In the command use sub-dir option if you are running only for the
directory where limit is set. else if you are
running on the brick mount path you can remove it.

The full-log will consume a lot of space as its going to record the
xattrs for each entry inside the path we are
running it. This data is needed to cross check and verify quota's
marker functionality.

To reduce resource consumption you can run it on one replica set alone
(if its replicate volume)
But its better if you can run it on all the brick if possible and if
the size consumed is fine with you.

Make sure you run it with the script link provided above by Sanoj. (patch set 6)

Post by Mauro Tridici
Hi Hari,
thank you very much for your answer.
I will try to use the script mentioned above pointing to each backend bricks.
So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9
I will execute the following command and I will put here the output.
./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
Thank you again for your support.
Regards,
Mauro
Hi,
There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.
Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your “backup” suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
Try running the script and revert with the logs generated.
Thanks,
Sanoj
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004” folder, the amount of used disk reported by quota command doesn’t reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Regards,
Hari Gowtham.

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-07-10 13:06:53 UTC

Permalink

Hi Hari,

sorry for the late.
Yes, the gluster volume is a single volume that is spread between all the 3 node and has 36 bricks

In attachment you can find a tar.gz file containing:

- gluster volume status command output;
- gluster volume info command output;
- the output of the following script execution (it generated 3 files per server: s01.log, s02.log, s03.log).

This is the âcheck.shâ script that has been executed on each server (servers are s01, s02, s03).

#!/bin/bash

#set -xv

host=$(hostname)

for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir CSP/ans004 /gluster/mnt$i/brick >> $host.log
done

Many thanks,
Mauro

Post by Sanoj Unnikrishnan
Hi Mauro,
Can you send the gluster v status command output?
Is it a single volume that is spread between all the 3 node and has 36 bricks?
If yes, you will have to run on all the bricks.
In the command use sub-dir option if you are running only for the
directory where limit is set. else if you are
running on the brick mount path you can remove it.
The full-log will consume a lot of space as its going to record the
xattrs for each entry inside the path we are
running it. This data is needed to cross check and verify quota's
marker functionality.
To reduce resource consumption you can run it on one replica set alone
(if its replicate volume)
But its better if you can run it on all the brick if possible and if
the size consumed is fine with you.
Make sure you run it with the script link provided above by Sanoj. (patch set 6)

Post by Mauro Tridici
Hi Hari,
thank you very much for your answer.
I will try to use the script mentioned above pointing to each backend bricks.
So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9
I will execute the following command and I will put here the output.
./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
Thank you again for your support.
Regards,
Mauro
Hi,
There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.
Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your âbackupâ suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
Try running the script and revert with the logs generated.
Thanks,
Sanoj
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004â folder, the amount of used disk reported by quota command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Regards,
Hari Gowtham.

--
Regards,
Hari Gowtham.

Hari Gowtham

2018-07-11 07:16:31 UTC

Permalink

Hi,

There was a accounting issue in your setup.
The directory ans004/ftp/CMCC-CM2-VHR4-CTR/atm/hist and ans004/ftp/CMCC-CM2-VHR4
had wrong size value on them.

To fix it, you will have to set dirty xattr (an internal gluster
xattr) on these directories
which will mark it for calculating the values again for the directory.
And then do a du on the mount after setting the xattrs. This will do a
stat that will
calculate and update the right values.

To set dirty xattr:
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 <path to the directory>
This has to be done for both the directories one after the other on each brick.
Once done for all the bricks issue the du command.

Thanks to Sanoj for the guidance

Post by Mauro Tridici
Hi Hari,
sorry for the late.
Yes, the gluster volume is a single volume that is spread between all the 3 node and has 36 bricks
- gluster volume status command output;
- gluster volume info command output;
- the output of the following script execution (it generated 3 files per server: s01.log, s02.log, s03.log).
This is the “check.sh” script that has been executed on each server (servers are s01, s02, s03).
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir CSP/ans004 /gluster/mnt$i/brick >> $host.log
done
Many thanks,
Mauro
Hi Mauro,
Can you send the gluster v status command output?
Is it a single volume that is spread between all the 3 node and has 36 bricks?
If yes, you will have to run on all the bricks.
In the command use sub-dir option if you are running only for the
directory where limit is set. else if you are
running on the brick mount path you can remove it.
The full-log will consume a lot of space as its going to record the
xattrs for each entry inside the path we are
running it. This data is needed to cross check and verify quota's
marker functionality.
To reduce resource consumption you can run it on one replica set alone
(if its replicate volume)
But its better if you can run it on all the brick if possible and if
the size consumed is fine with you.
Make sure you run it with the script link provided above by Sanoj. (patch set 6)
Hi Hari,
thank you very much for your answer.
I will try to use the script mentioned above pointing to each backend bricks.
So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9
I will execute the following command and I will put here the output.
./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
Thank you again for your support.
Regards,
Mauro
Hi,
There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.
Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your “backup” suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
Try running the script and revert with the logs generated.
Thanks,
Sanoj
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004” folder, the amount of used disk reported by quota command doesn’t reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Regards,
Hari Gowtham.
--
Regards,
Hari Gowtham.

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-07-11 08:23:25 UTC

Permalink

Hi Hari, Hi Sanoj,

thank you very much for your patience and your support!
The problem has been solved following your instructions :-)

N.B.: in order to reduce the running time, I executed the âduâ command as follows:

for i in {1..12}
do
du /gluster/mnt$i/brick/CSP/ans004/ftp
done

and not on each brick at "/gluster/mnt$i/brick" tree level.

I hope it was a correct idea :-)

Thank you again for helping me to solve this issue.
Have a good day.
Mauro

Post by Hari Gowtham
Hi,
There was a accounting issue in your setup.
The directory ans004/ftp/CMCC-CM2-VHR4-CTR/atm/hist and ans004/ftp/CMCC-CM2-VHR4
had wrong size value on them.
To fix it, you will have to set dirty xattr (an internal gluster
xattr) on these directories
which will mark it for calculating the values again for the directory.
And then do a du on the mount after setting the xattrs. This will do a
stat that will
calculate and update the right values.
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 <path to the directory>
This has to be done for both the directories one after the other on each brick.
Once done for all the bricks issue the du command.
Thanks to Sanoj for the guidance

Post by Mauro Tridici
Hi Hari,
sorry for the late.
Yes, the gluster volume is a single volume that is spread between all the 3 node and has 36 bricks
- gluster volume status command output;
- gluster volume info command output;
- the output of the following script execution (it generated 3 files per server: s01.log, s02.log, s03.log).
This is the âcheck.shâ script that has been executed on each server (servers are s01, s02, s03).
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir CSP/ans004 /gluster/mnt$i/brick >> $host.log
done
Many thanks,
Mauro
Hi Mauro,
Can you send the gluster v status command output?
Is it a single volume that is spread between all the 3 node and has 36 bricks?
If yes, you will have to run on all the bricks.
In the command use sub-dir option if you are running only for the
directory where limit is set. else if you are
running on the brick mount path you can remove it.
The full-log will consume a lot of space as its going to record the
xattrs for each entry inside the path we are
running it. This data is needed to cross check and verify quota's
marker functionality.
To reduce resource consumption you can run it on one replica set alone
(if its replicate volume)
But its better if you can run it on all the brick if possible and if
the size consumed is fine with you.
Make sure you run it with the script link provided above by Sanoj. (patch set 6)
Hi Hari,
thank you very much for your answer.
I will try to use the script mentioned above pointing to each backend bricks.
So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9
I will execute the following command and I will put here the output.
./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
Thank you again for your support.
Regards,
Mauro
Hi,
There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.
Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your âbackupâ suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
Try running the script and revert with the logs generated.
Thanks,
Sanoj
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004â folder, the amount of used disk reported by quota command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Regards,
Hari Gowtham.
--
Regards,
Hari Gowtham.

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-09-08 18:46:21 UTC

Permalink

Hi Hari, Hi Sanoj,

sorry if I disturb you again, but I have a problem similar to the one described (and solved) below.
As you can see from the following output there is a mismatch between the used disk size reported by quota and the value returned by du command.

[***@s01 qc]# df -hT /tier2/ASC
File system Tipo Dim. Usati Dispon. Uso% Montato su
s01-stg:tier2 fuse.glusterfs 10T 4,5T 5,6T 45% /tier2

[***@s01 qc]# gluster volume quota tier2 list /ASC
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 4.4TB 5.6TB No No

[***@s01 qc]# du -hs /tier2/ASC
1,9T /tier2/ASC

I already executed the "quota_fsck_new-6.pyâ script to identify the folder to be fixed.
In attachment you can find the output produced executing a customized âcheckâ script on each gluster server (s01, s02, s03 are the names of the servers).

check script:

#!/bin/bash

#set -xv

host=$(hostname)

for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done

I donât know how to read the log files in order to detect the critical folder.
Anyway, since the mismatch problem has been detected after some files have been deleted from /tier2/ASC/primavera/cam directory, I thought that the problem should be there.
So, I tried to execute the following customized fix script:

fix script:

#!/bin/bash

#set -xv

host=$(hostname)

for i in {1..12}
do
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera/cam
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC

#after setfattr procedure, please comment setaffr lines and uncomment du lines and execute the script again

#du /gluster/mnt$i/brick/ASC/primavera/cam
#du /gluster/mnt$i/brick/ASC/primavera
#du /gluster/mnt$i/brick/ASC

done
~

Unfortunately, the problem seems to be still here.
I also tried to use the quota-fsck script using âfix-issues option, but nothing changed.
Could you please help me to try to solve this issue?

Thank you very much in advance,
Mauro

Post by Mauro Tridici
Hi Hari, Hi Sanoj,
thank you very much for your patience and your support!
The problem has been solved following your instructions :-)
for i in {1..12}
do
du /gluster/mnt$i/brick/CSP/ans004/ftp
done
and not on each brick at "/gluster/mnt$i/brick" tree level.
I hope it was a correct idea :-)
Thank you again for helping me to solve this issue.
Have a good day.
Mauro

Post by Mauro Tridici
Hi Hari,
sorry for the late.
Yes, the gluster volume is a single volume that is spread between all the 3 node and has 36 bricks
- gluster volume status command output;
- gluster volume info command output;
- the output of the following script execution (it generated 3 files per server: s01.log, s02.log, s03.log).
This is the âcheck.shâ script that has been executed on each server (servers are s01, s02, s03).
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir CSP/ans004 /gluster/mnt$i/brick >> $host.log
done
Many thanks,
Mauro
Hi Mauro,
Can you send the gluster v status command output?
Is it a single volume that is spread between all the 3 node and has 36 bricks?
If yes, you will have to run on all the bricks.
In the command use sub-dir option if you are running only for the
directory where limit is set. else if you are
running on the brick mount path you can remove it.
The full-log will consume a lot of space as its going to record the
xattrs for each entry inside the path we are
running it. This data is needed to cross check and verify quota's
marker functionality.
To reduce resource consumption you can run it on one replica set alone
(if its replicate volume)
But its better if you can run it on all the brick if possible and if
the size consumed is fine with you.
Make sure you run it with the script link provided above by Sanoj. (patch set 6)
Hi Hari,
thank you very much for your answer.
I will try to use the script mentioned above pointing to each backend bricks.
So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9
I will execute the following command and I will put here the output.
./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
Thank you again for your support.
Regards,
Mauro
Hi,
There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.
Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your âbackupâ suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py>
Try running the script and revert with the logs generated.
Thanks,
Sanoj
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004â folder, the amount of used disk reported by quota command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Regards,
Hari Gowtham.
--
Regards,
Hari Gowtham.

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-09-08 18:55:11 UTC

Permalink

Ops, sorry.
Attached ZIP file was quarantined by mail security system policies (it seems that .zip and .docm files are not good).

So, I try to send you the log files in the tar.gz format

Thank you,
Mauro

Post by Mauro Tridici
Hi Hari, Hi Sanoj,
sorry if I disturb you again, but I have a problem similar to the one described (and solved) below.
As you can see from the following output there is a mismatch between the used disk size reported by quota and the value returned by du command.
File system Tipo Dim. Usati Dispon. Uso% Montato su
s01-stg:tier2 fuse.glusterfs 10T 4,5T 5,6T 45% /tier2
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 4.4TB 5.6TB No No
1,9T /tier2/ASC
I already executed the "quota_fsck_new-6.pyâ script to identify the folder to be fixed.
In attachment you can find the output produced executing a customized âcheckâ script on each gluster server (s01, s02, s03 are the names of the servers).
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
I donât know how to read the log files in order to detect the critical folder.
Anyway, since the mismatch problem has been detected after some files have been deleted from /tier2/ASC/primavera/cam directory, I thought that the problem should be there.
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera/cam
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC
#after setfattr procedure, please comment setaffr lines and uncomment du lines and execute the script again
#du /gluster/mnt$i/brick/ASC/primavera/cam
#du /gluster/mnt$i/brick/ASC/primavera
#du /gluster/mnt$i/brick/ASC
done
~
Unfortunately, the problem seems to be still here.
I also tried to use the quota-fsck script using âfix-issues option, but nothing changed.
Could you please help me to try to solve this issue?
Thank you very much in advance,
Mauro
<logs.zip>

Post by Mauro Tridici
Hi Hari,
sorry for the late.
Yes, the gluster volume is a single volume that is spread between all the 3 node and has 36 bricks
- gluster volume status command output;
- gluster volume info command output;
- the output of the following script execution (it generated 3 files per server: s01.log, s02.log, s03.log).
This is the âcheck.shâ script that has been executed on each server (servers are s01, s02, s03).
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir CSP/ans004 /gluster/mnt$i/brick >> $host.log
done
Many thanks,
Mauro
Hi Mauro,
Can you send the gluster v status command output?
Is it a single volume that is spread between all the 3 node and has 36 bricks?
If yes, you will have to run on all the bricks.
In the command use sub-dir option if you are running only for the
directory where limit is set. else if you are
running on the brick mount path you can remove it.
The full-log will consume a lot of space as its going to record the
xattrs for each entry inside the path we are
running it. This data is needed to cross check and verify quota's
marker functionality.
To reduce resource consumption you can run it on one replica set alone
(if its replicate volume)
But its better if you can run it on all the brick if possible and if
the size consumed is fine with you.
Make sure you run it with the script link provided above by Sanoj. (patch set 6)
Hi Hari,
thank you very much for your answer.
I will try to use the script mentioned above pointing to each backend bricks.
So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
/dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
/dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
/dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
/dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
/dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
/dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
/dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
/dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
/dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
/dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
/dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
/dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
/dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9
I will execute the following command and I will put here the output.
./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
Thank you again for your support.
Regards,
Mauro
Hi,
There is no explicit command to backup all the quota limits as per my
understanding. need to look further about this.
But you can do the following to backup and set it.
Gluster volume quota volname list which will print all the quota
limits on that particular volume.
You will have to make a note of the directories with their respective limit set.
Once noted down, you can disable quota on the volume and then enable it.
Once enabled, you will have to set each limit explicitly on the volume.
Before doing this we suggest you can to try running the script
mentioned above with the backend brick path instead of the mount path.
you need to run this on the machines where the backend bricks are
located and not on the mount.
Hi Sanoj,
could you provide me the command that I need in order to backup all quota limits?
If there is no solution for this kind of problem, I would like to try to follow your âbackupâ suggestion.
Do you think that I should contact gluster developers too?
Thank you very much.
Regards,
Mauro
Hi Sanoj,
unfortunately the output of the command execution was not helpful.
Do you have some other idea in order to detect the cause of the issue?
Thank you again,
Mauro
Hi Mauro,
A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.
Please capture the xattr values first, so that we can get to know what went wrong.
Regards,
Sanoj
Dear Sanoj,
thank you very much for your support.
I just downloaded and executed the script you suggested.
./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
In attachment, you can find the logs generated by the script.
What can I do now?
Thank you very much for your patience.
Mauro
Hi Mauro,
This may be an issue with update of backend xattrs.
To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py>
Try running the script and revert with the logs generated.
Thanks,
Sanoj
Dear Users,
I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004â folder, the amount of used disk reported by quota command doesnât reflect the value indicated by du command.
Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
In my case, the problem seems unfortunately still here.
How can I solve this issue? Is it possible to do it without starting a downtime period?
Thank you very much in advance,
Mauro
glusterfs 3.10.5
Repository revision: git://git.gluster.org/glusterfs.git <git://git.gluster.org/glusterfs.git>
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/ <https://www.gluster.org/>>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
295G /tier2/CSP/ans004/
_______________________________________________
Gluster-users mailing list
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Regards,
Hari Gowtham.
--
Regards,
Hari Gowtham.

--
Regards,
Hari Gowtham.

Hari Gowtham

2018-09-10 07:17:00 UTC

Permalink

Hi Mauro,

The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.

To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
On this top most directory you have to do a set dirty xattr and then
do a lookup.

If there are two different directories without a common top directory,
then both these have to undergo the above process.

The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?

Mauro Tridici

2018-09-10 07:46:49 UTC

Permalink

Hi Hari,

thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.

Thank you again for the detailed explanation.
Regards,
Mauro

Post by Sanoj Unnikrishnan
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?

Hari Gowtham

2018-09-10 08:51:30 UTC

Permalink

Hi,

Looking at the logs, I can see that the file:

/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam

has mismatch.

You can try setting dirty for this and then do a du on it.

A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.

Post by Mauro Tridici
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).

The contri size in the xattr and the aggregated size have to be checked

Post by Mauro Tridici
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.

Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.

Post by Mauro Tridici
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-09-10 09:42:52 UTC

Permalink

Dear Hari,

I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with âfix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100â command against the files and directory mentioned by you (on each available brick).
Disk quota assigned to /tier2/ASC directory seems to be partially used (about 2,6 TB used), but the âreal and currentâ situation is the following one (I deleted all files in primavera directory):

[***@s03 qc]# du -hsc /tier2/ASC/*
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale

So, I think that the problem should be only in "orientgateâ or in âprimaveraâ directory, right!?
For this reason, in order to collect some fresh logs, I executed again the check script starting from the top level directory âASCâ using the following bash script (named hari-20180910) based on the new version of quota_fsck (rel. 9):

hari-20180910 script:

#!/bin/bash

#set -xv

host=$(hostname)

for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~

In attachment, you can find the log files generated by the script.

SOME IMPORTANT NOTES:

- in the new log files, âprimaveraâ directory is no more present

Is there something more that I can do?

Thank you very much for your patience.
Regards,
Mauro

Post by Hari Gowtham
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.

The contri size in the xattr and the aggregated size have to be checked

Post by Mauro Tridici
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.

Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.

--
Regards,
Hari Gowtham.

Hari Gowtham

2018-09-10 10:27:13 UTC

Permalink

Post by Mauro Tridici
Dear Hari,
I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with —fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100” command against the files and directory mentioned by you (on each available brick).

There can be an issue with fix-issue in the script. As the directories
with accounting mismatch awre found its better to set the dirty xattr
and then do a du(this way its wasy and has to resolve the issue). The
script can be used when we dont know where the issue is.
If the files are deleted, then state of the log file from the script
is outdated. The folders I suggested are as per the old log file, So
setting the dirty xattr and then doing a lookup (du on that dir) might
not help.

Post by Mauro Tridici
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale
So, I think that the problem should be only in "orientgate” or in “primavera” directory, right!?
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~
In attachment, you can find the log files generated by the script.
- in the new log files, “primavera” directory is no more present
Is there something more that I can do?

As there were files that were deleted, the accounting would have changed again.

Need to look from the beginning, as the above suggestions may not be
true anymore.

I find that the log files are edited. A few lines are missing. Can you
send the actual log file from running the script
And i would recommend you to run the script after all the files are
deleted (or other major modifications are done).
So that we can fix once at the end.

If the fix-issue argument on script doesn't work on the directory/
subdirectory where you find mismatch, then you can send the whole
file.
Will check the log and let you know where you need to do the lookup.

Post by Mauro Tridici
Thank you very much for your patience.
Regards,
Mauro
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
The contri size in the xattr and the aggregated size have to be checked
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-09-10 11:28:00 UTC

Permalink

Dear Hari,

the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files.
The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py
I didnât edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa!

Anyway, as you suggested, I executed again the quota-fsck script with option âfix-issues.
At the end of script execution, I launched the du command, but the problem is still there.

[***@s02 auto]# df -hT /tier2/ASC/
File system Tipo Dim. Usati Dispon. Uso% Montato su
s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2

Iâm sorry to bother you so much.
Last time I used the script everything went smoothly, but this time it seems to be more difficult.

In attachment you can find the new log files.

Thank you,
Mauro

Post by Hari Gowtham

Post by Mauro Tridici
Dear Hari,
I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with âfix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100â command against the files and directory mentioned by you (on each available brick).

Post by Mauro Tridici
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale
So, I think that the problem should be only in "orientgateâ or in âprimaveraâ directory, right!?
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~
In attachment, you can find the log files generated by the script.
- in the new log files, âprimaveraâ directory is no more present
Is there something more that I can do?

As there were files that were deleted, the accounting would have changed again.
Need to look from the beginning, as the above suggestions may not be
true anymore.
I find that the log files are edited. A few lines are missing. Can you
send the actual log file from running the script
And i would recommend you to run the script after all the files are
deleted (or other major modifications are done).
So that we can fix once at the end.
If the fix-issue argument on script doesn't work on the directory/
subdirectory where you find mismatch, then you can send the whole
file.
Will check the log and let you know where you need to do the lookup.

Post by Mauro Tridici
Thank you very much for your patience.
Regards,
Mauro
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
The contri size in the xattr and the aggregated size have to be checked
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

--
Regards,
Hari Gowtham.

Hari Gowtham

2018-09-10 14:02:04 UTC

Permalink

Hi Mauro,

I went through the log file you have shared.
I don't find any mismatch.

This can be because of various reasons:
1) the accounting which was wrong is now fine. but as per your comment
above if this is the case,
then the crawl should still be happening which is why the its not yet
reflected. (will reflect after a while)
2) the fix-issue part of the script might be wrong.
3) or the final script that we use might be wrong.

You can wait for a while (based on the number of files the time will
vary) and then see if the accounting is fine.
If its not fine even after a while, then we will have to run the
script (6th patch set has worked so can be reused) without "fix-issue"
This will give us the mismatch in log file, which i can read and let
you know where the lookup has to be done.

Post by Mauro Tridici
Dear Hari,
the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files.
The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py
I didn’t edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa!
Anyway, as you suggested, I executed again the quota-fsck script with option —fix-issues.
At the end of script execution, I launched the du command, but the problem is still there.
File system Tipo Dim. Usati Dispon. Uso% Montato su
s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2
I’m sorry to bother you so much.
Last time I used the script everything went smoothly, but this time it seems to be more difficult.
In attachment you can find the new log files.
Thank you,
Mauro
Dear Hari,
I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with —fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100” command against the files and directory mentioned by you (on each available brick).
There can be an issue with fix-issue in the script. As the directories
with accounting mismatch awre found its better to set the dirty xattr
and then do a du(this way its wasy and has to resolve the issue). The
script can be used when we dont know where the issue is.
If the files are deleted, then state of the log file from the script
is outdated. The folders I suggested are as per the old log file, So
setting the dirty xattr and then doing a lookup (du on that dir) might
not help.
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale
So, I think that the problem should be only in "orientgate” or in “primavera” directory, right!?
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~
In attachment, you can find the log files generated by the script.
- in the new log files, “primavera” directory is no more present
Is there something more that I can do?
As there were files that were deleted, the accounting would have changed again.
Need to look from the beginning, as the above suggestions may not be
true anymore.
I find that the log files are edited. A few lines are missing. Can you
send the actual log file from running the script
And i would recommend you to run the script after all the files are
deleted (or other major modifications are done).
So that we can fix once at the end.
If the fix-issue argument on script doesn't work on the directory/
subdirectory where you find mismatch, then you can send the whole
file.
Will check the log and let you know where you need to do the lookup.
Thank you very much for your patience.
Regards,
Mauro
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
The contri size in the xattr and the aggregated size have to be checked
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-09-10 14:08:49 UTC

Permalink

Hi Hari,

thank you very much for your support.
I will do everything you suggested and I will contact you as soon as all the steps will be completed.

Thank you,
Mauro

Post by Sanoj Unnikrishnan
Hi Mauro,
I went through the log file you have shared.
I don't find any mismatch.
1) the accounting which was wrong is now fine. but as per your comment
above if this is the case,
then the crawl should still be happening which is why the its not yet
reflected. (will reflect after a while)
2) the fix-issue part of the script might be wrong.
3) or the final script that we use might be wrong.
You can wait for a while (based on the number of files the time will
vary) and then see if the accounting is fine.
If its not fine even after a while, then we will have to run the
script (6th patch set has worked so can be reused) without "fix-issue"
This will give us the mismatch in log file, which i can read and let
you know where the lookup has to be done.

Post by Mauro Tridici
Dear Hari,
the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files.
The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py
I didnât edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa!
Anyway, as you suggested, I executed again the quota-fsck script with option âfix-issues.
At the end of script execution, I launched the du command, but the problem is still there.
File system Tipo Dim. Usati Dispon. Uso% Montato su
s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2
Iâm sorry to bother you so much.
Last time I used the script everything went smoothly, but this time it seems to be more difficult.
In attachment you can find the new log files.
Thank you,
Mauro
Dear Hari,
I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with âfix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100â command against the files and directory mentioned by you (on each available brick).
There can be an issue with fix-issue in the script. As the directories
with accounting mismatch awre found its better to set the dirty xattr
and then do a du(this way its wasy and has to resolve the issue). The
script can be used when we dont know where the issue is.
If the files are deleted, then state of the log file from the script
is outdated. The folders I suggested are as per the old log file, So
setting the dirty xattr and then doing a lookup (du on that dir) might
not help.
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale
So, I think that the problem should be only in "orientgateâ or in âprimaveraâ directory, right!?
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~
In attachment, you can find the log files generated by the script.
- in the new log files, âprimaveraâ directory is no more present
Is there something more that I can do?
As there were files that were deleted, the accounting would have changed again.
Need to look from the beginning, as the above suggestions may not be
true anymore.
I find that the log files are edited. A few lines are missing. Can you
send the actual log file from running the script
And i would recommend you to run the script after all the files are
deleted (or other major modifications are done).
So that we can fix once at the end.
If the fix-issue argument on script doesn't work on the directory/
subdirectory where you find mismatch, then you can send the whole
file.
Will check the log and let you know where you need to do the lookup.
Thank you very much for your patience.
Regards,
Mauro
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
The contri size in the xattr and the aggregated size have to be checked
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-09-10 14:32:55 UTC

Permalink

Hi Hari,

good news for us!

A few seconds ago, I submitted the gluster quota list command in order to save the current quota status.

[***@s01 auto]# gluster volume quota tier2 list /ASC
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 2.6TB 7.4TB No No

In the same time, I was asking myself how I can stimulate a sort of directory âscanâ in order to refresh the quota value without waiting for the automatic scan.
So, I decided to start a âdu -hs /tier2/ASCâ session (without specify each single brick path as I usually do after quota-fsck script execution).

[***@s01 auto]# du -hs /tier2/ASC
22G /tier2/ASC

Now, magically, the quota value reflects the real disk space usage info provided by the âduâ command.

[***@s01 auto]# gluster volume quota tier2 list /ASC
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 21.8GB 10.0TB No No

Do you think that I was only lucky or is there a particular reason why everything is now working?

Thank you,
Mauro

Post by Mauro Tridici
Hi Hari,
thank you very much for your support.
I will do everything you suggested and I will contact you as soon as all the steps will be completed.
Thank you,
Mauro

Post by Mauro Tridici
Dear Hari,
the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files.
The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py>
I didnât edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa!
Anyway, as you suggested, I executed again the quota-fsck script with option âfix-issues.
At the end of script execution, I launched the du command, but the problem is still there.
File system Tipo Dim. Usati Dispon. Uso% Montato su
s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2
Iâm sorry to bother you so much.
Last time I used the script everything went smoothly, but this time it seems to be more difficult.
In attachment you can find the new log files.
Thank you,
Mauro
Dear Hari,
I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with âfix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100â command against the files and directory mentioned by you (on each available brick).
There can be an issue with fix-issue in the script. As the directories
with accounting mismatch awre found its better to set the dirty xattr
and then do a du(this way its wasy and has to resolve the issue). The
script can be used when we dont know where the issue is.
If the files are deleted, then state of the log file from the script
is outdated. The folders I suggested are as per the old log file, So
setting the dirty xattr and then doing a lookup (du on that dir) might
not help.
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale
So, I think that the problem should be only in "orientgateâ or in âprimaveraâ directory, right!?
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~
In attachment, you can find the log files generated by the script.
- in the new log files, âprimaveraâ directory is no more present
Is there something more that I can do?
As there were files that were deleted, the accounting would have changed again.
Need to look from the beginning, as the above suggestions may not be
true anymore.
I find that the log files are edited. A few lines are missing. Can you
send the actual log file from running the script
And i would recommend you to run the script after all the files are
deleted (or other major modifications are done).
So that we can fix once at the end.
If the fix-issue argument on script doesn't work on the directory/
subdirectory where you find mismatch, then you can send the whole
file.
Will check the log and let you know where you need to do the lookup.
Thank you very much for your patience.
Regards,
Mauro
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
The contri size in the xattr and the aggregated size have to be checked
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

--
Regards,
Hari Gowtham.

-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it <http://www.cmcc.it/>
mobile: (+39) 327 5630841

Hari Gowtham

2018-09-11 08:49:10 UTC

Permalink

Hi Mauro,

It was because the quota crawl takes some time and it was working on it.
When we ran the fix-issues it makes changes to the backend and does a lookup.
It takes time for the whole thing to reflect in the quota list command.
Earlier, it didnt reflect as it was still crawling. So this is the
same as the first reason I
have mentioned above in the 3 situations that could have happened.
This is the expected behavior.

Regards,
Hari.

Post by Mauro Tridici
Hi Hari,
good news for us!
A few seconds ago, I submitted the gluster quota list command in order to save the current quota status.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 2.6TB 7.4TB No No
In the same time, I was asking myself how I can stimulate a sort of directory “scan” in order to refresh the quota value without waiting for the automatic scan.
So, I decided to start a “du -hs /tier2/ASC” session (without specify each single brick path as I usually do after quota-fsck script execution).
22G /tier2/ASC
Now, magically, the quota value reflects the real disk space usage info provided by the “du” command.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 21.8GB 10.0TB No No
Do you think that I was only lucky or is there a particular reason why everything is now working?
Thank you,
Mauro
Hi Hari,
thank you very much for your support.
I will do everything you suggested and I will contact you as soon as all the steps will be completed.
Thank you,
Mauro
Hi Mauro,
I went through the log file you have shared.
I don't find any mismatch.
1) the accounting which was wrong is now fine. but as per your comment
above if this is the case,
then the crawl should still be happening which is why the its not yet
reflected. (will reflect after a while)
2) the fix-issue part of the script might be wrong.
3) or the final script that we use might be wrong.
You can wait for a while (based on the number of files the time will
vary) and then see if the accounting is fine.
If its not fine even after a while, then we will have to run the
script (6th patch set has worked so can be reused) without "fix-issue"
This will give us the mismatch in log file, which i can read and let
you know where the lookup has to be done.
Dear Hari,
the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files.
The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py
I didn’t edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa!
Anyway, as you suggested, I executed again the quota-fsck script with option —fix-issues.
At the end of script execution, I launched the du command, but the problem is still there.
File system Tipo Dim. Usati Dispon. Uso% Montato su
s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2
I’m sorry to bother you so much.
Last time I used the script everything went smoothly, but this time it seems to be more difficult.
In attachment you can find the new log files.
Thank you,
Mauro
Dear Hari,
I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with —fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100” command against the files and directory mentioned by you (on each available brick).
There can be an issue with fix-issue in the script. As the directories
with accounting mismatch awre found its better to set the dirty xattr
and then do a du(this way its wasy and has to resolve the issue). The
script can be used when we dont know where the issue is.
If the files are deleted, then state of the log file from the script
is outdated. The folders I suggested are as per the old log file, So
setting the dirty xattr and then doing a lookup (du on that dir) might
not help.
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale
So, I think that the problem should be only in "orientgate” or in “primavera” directory, right!?
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~
In attachment, you can find the log files generated by the script.
- in the new log files, “primavera” directory is no more present
Is there something more that I can do?
As there were files that were deleted, the accounting would have changed again.
Need to look from the beginning, as the above suggestions may not be
true anymore.
I find that the log files are edited. A few lines are missing. Can you
send the actual log file from running the script
And i would recommend you to run the script after all the files are
deleted (or other major modifications are done).
So that we can fix once at the end.
If the fix-issue argument on script doesn't work on the directory/
subdirectory where you find mismatch, then you can send the whole
file.
Will check the log and let you know where you need to do the lookup.
Thank you very much for your patience.
Regards,
Mauro
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
The contri size in the xattr and the aggregated size have to be checked
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

--
Regards,
Hari Gowtham.

Mauro Tridici

2018-09-11 09:52:08 UTC

Permalink

Hi Hari,

thank you very much for the explanation and for your important support.

Best regards,
Mauro

Post by Sanoj Unnikrishnan
Hi Mauro,
It was because the quota crawl takes some time and it was working on it.
When we ran the fix-issues it makes changes to the backend and does a lookup.
It takes time for the whole thing to reflect in the quota list command.
Earlier, it didnt reflect as it was still crawling. So this is the
same as the first reason I
have mentioned above in the 3 situations that could have happened.
This is the expected behavior.
Regards,
Hari.

Post by Mauro Tridici
Hi Hari,
good news for us!
A few seconds ago, I submitted the gluster quota list command in order to save the current quota status.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 2.6TB 7.4TB No No
In the same time, I was asking myself how I can stimulate a sort of directory âscanâ in order to refresh the quota value without waiting for the automatic scan.
So, I decided to start a âdu -hs /tier2/ASCâ session (without specify each single brick path as I usually do after quota-fsck script execution).
22G /tier2/ASC
Now, magically, the quota value reflects the real disk space usage info provided by the âduâ command.
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 21.8GB 10.0TB No No
Do you think that I was only lucky or is there a particular reason why everything is now working?
Thank you,
Mauro
Hi Hari,
thank you very much for your support.
I will do everything you suggested and I will contact you as soon as all the steps will be completed.
Thank you,
Mauro
Hi Mauro,
I went through the log file you have shared.
I don't find any mismatch.
1) the accounting which was wrong is now fine. but as per your comment
above if this is the case,
then the crawl should still be happening which is why the its not yet
reflected. (will reflect after a while)
2) the fix-issue part of the script might be wrong.
3) or the final script that we use might be wrong.
You can wait for a while (based on the number of files the time will
vary) and then see if the accounting is fine.
If its not fine even after a while, then we will have to run the
script (6th patch set has worked so can be reused) without "fix-issue"
This will give us the mismatch in log file, which i can read and let
you know where the lookup has to be done.
Dear Hari,
the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files.
The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py
I didnât edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa!
Anyway, as you suggested, I executed again the quota-fsck script with option âfix-issues.
At the end of script execution, I launched the du command, but the problem is still there.
File system Tipo Dim. Usati Dispon. Uso% Montato su
s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2
Iâm sorry to bother you so much.
Last time I used the script everything went smoothly, but this time it seems to be more difficult.
In attachment you can find the new log files.
Thank you,
Mauro
Dear Hari,
I followed you suggestions, but, unfortunately, nothing is changed.
I tried to execute both the quota-fsck script with âfix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100â command against the files and directory mentioned by you (on each available brick).
There can be an issue with fix-issue in the script. As the directories
with accounting mismatch awre found its better to set the dirty xattr
and then do a du(this way its wasy and has to resolve the issue). The
script can be used when we dont know where the issue is.
If the files are deleted, then state of the log file from the script
is outdated. The folders I suggested are as per the old log file, So
setting the dirty xattr and then doing a lookup (du on that dir) might
not help.
22G /tier2/ASC/orientgate
26K /tier2/ASC/primavera
22G totale
So, I think that the problem should be only in "orientgateâ or in âprimaveraâ directory, right!?
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
~
In attachment, you can find the log files generated by the script.
- in the new log files, âprimaveraâ directory is no more present
Is there something more that I can do?
As there were files that were deleted, the accounting would have changed again.
Need to look from the beginning, as the above suggestions may not be
true anymore.
I find that the log files are edited. A few lines are missing. Can you
send the actual log file from running the script
And i would recommend you to run the script after all the files are
deleted (or other major modifications are done).
So that we can fix once at the end.
If the fix-issue argument on script doesn't work on the directory/
subdirectory where you find mismatch, then you can send the whole
file.
Will check the log and let you know where you need to do the lookup.
Thank you very much for your patience.
Regards,
Mauro
Hi,
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
/primavera/cam
has mismatch.
You can try setting dirty for this and then do a du on it.
A few corrections for my above comments.
The contri size in the xattr and the aggregated size have to be checked.
Hi Hari,
thank you very much for your help.
I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible.
Thank you again for the detailed explanation.
Regards,
Mauro
Hi Mauro,
The problem might be at some other place, So setting the xattr and
doing the lookup might not have fixed the issue.
To resolve this we need to read the log file reported by the fsck
script. In this log file we need to look for the size reported by the
xattr (the value "SIZE:" in the log file) and the size reported by the
stat on the file (the value after "st_size=" ).
The contri size in the xattr and the aggregated size have to be checked
These two should be the same. If they mismatch, then we have to find
the top most dir which has the mismatch.
Bottom most dir/file has to be found. Replace top with bottom in the
following places as well.
On this top most directory you have to do a set dirty xattr and then
do a lookup.
If there are two different directories without a common top directory,
then both these have to undergo the above process.
The fsck script should work fine. can you try the "--fix-issue" with
the latest script instead of the 6th patch used above?
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
--
Regards,
Hari Gowtham.
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - UniversitÃ del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841

--
Regards,
Hari Gowtham.