Discussion:
question about sync replicate volume after rebooting one node
(too old to reply)
songxin
2016-02-16 10:29:50 UTC
Permalink
Hi,
I have a question about how to sync volume between two bricks after one node is reboot.


There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip is 128.124.10.2.


operation steps on A node as below
1.gluster peer probe 128.124.10.2
2.mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1:/data/brick/gv0 128.124.10.2:/data/brick/gv1 force
4.gluster volume start gv0
5.mount -t glusterfs 128.124.10.1:/gv0 gluster


operation steps on B node as below
1.mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1:/gv0 gluster


After all steps above , there a some gluster service process, including glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster volume status.


Now reboot the B node.After B reboot , there are no gluster service running on B node.
After I systemctl start glusterd , there is just glusterd service but not glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume heal gv0 full.


I want to know why glusterd don't start glusterfs and glusterfsd.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?


Thanks,
Xin
Anuradha Talur
2016-02-16 10:53:03 UTC
Permalink
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster volume status.
Now reboot the B node.After B reboot , there are no gluster service running on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?

Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.

gluster volume status <VOLNAME> should be able to provide you with gluster process status.

On restarting the node, glusterfs process for mount won't start by itself. You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.

If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
songxin
2016-02-17 02:53:41 UTC
Permalink
Hi,
Thank you for your immediate and detailed reply.And I have a few more question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490] [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076] [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010] [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management: Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum = 4087388312 on peer 128.224.162.163
[2015-12-07 07:54:55.778384] I [MSGID: 106493] [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493] [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host: 128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1


Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)


When I run "gluster volume status" on A node it show as below.

Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y 13019
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 13045

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks


It looks like the glusterfsd service is ok on A node.


If because the peer state is Rejected so gluterd didn't start the glusterfsd?What causes this problem£¿




2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/gluster ..


If it is£¬ I want to know if the glustershd is also the bin glusterfsd£¬ just like glusterd and glusterfs.


Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster process status.
On restarting the node, glusterfs process for mount won't start by itself. You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
Atin Mukherjee
2016-02-17 04:01:37 UTC
Permalink
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.

Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?


~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem?
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is, I want to know if the glustershd is also the bin glusterfsd,
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster process status.
On restarting the node, glusterfs process for mount won't start by itself. You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-17 06:14:14 UTC
Permalink
Hi£¬
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC. Does it cause the problem?


If I run " gluster volume start gv0 force " the glusterfsd can be started but "gluster volume start gv0" don't work.


The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0


The file /var/lib/glusterd/vols/gv0/info on A node as below.


***@pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0


Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem£¿
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is£¬ I want to know if the glustershd is also the bin glusterfsd£¬
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster process status.
On restarting the node, glusterfs process for mount won't start by itself. You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Anuradha Talur
2016-02-17 06:30:21 UTC
Permalink
----- Original Message -----
Sent: Wednesday, February 17, 2016 11:44:14 AM
Subject: Re:Re: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC. Does it
cause the problem?
If I run " gluster volume start gv0 force " the glusterfsd can be started but
"gluster volume start gv0" don't work.
Yes, there is a difference between volume start and volume start force.
When a volume is in "Started" state already, gluster volume start gv0 won't do
anything (meaning it doesn't bring up the dead bricks). When you say start force,
status of glusterfsd's is checked and the glusterfsd's not running are spawned.
Which is the case here in the setup you have.
The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
The file /var/lib/glusterd/vols/gv0/info on A node as below.
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem?
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is, I want to know if the glustershd is also the bin glusterfsd,
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after
rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running
"gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster
process status.
On restarting the node, glusterfs process for mount won't start by itself.
You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd --
self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon
does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal
<VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
songxin
2016-02-17 06:38:51 UTC
Permalink
Hi,
But I also don't know why glusterfsd can't be start by glusterd after B node rebooted.The version of glusterfs on A node and B node are both 3.7.6. Can you explain this for me please£¿


Thanks£¬
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Wednesday, February 17, 2016 11:44:14 AM
Subject: Re:Re: [Gluster-users] question about sync replicate volume after rebooting one node
Hi£¬
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC. Does it
cause the problem?
If I run " gluster volume start gv0 force " the glusterfsd can be started but
"gluster volume start gv0" don't work.
Yes, there is a difference between volume start and volume start force.
When a volume is in "Started" state already, gluster volume start gv0 won't do
anything (meaning it doesn't bring up the dead bricks). When you say start force,
status of glusterfsd's is checked and the glusterfsd's not running are spawned.
Which is the case here in the setup you have.
The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
The file /var/lib/glusterd/vols/gv0/info on A node as below.
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem£¿
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is£¬ I want to know if the glustershd is also the bin glusterfsd£¬
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after
rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running
"gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster
process status.
On restarting the node, glusterfs process for mount won't start by itself.
You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd --
self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon
does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal
<VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
Atin Mukherjee
2016-02-17 06:53:48 UTC
Permalink
Post by songxin
Hi,
But I also don't know why glusterfsd can't be start by glusterd after B
node rebooted.The version of glusterfs on A node and B node are both
3.7.6. Can you explain this for me please?
Its because the GlusterD has failed to start on Node B. I've already
asked you in another mail to provide the delta of the gv0's info file to
get to the root cause.
Post by songxin
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Wednesday, February 17, 2016 11:44:14 AM
Subject: Re:Re: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC. Does it
cause the problem?
If I run " gluster volume start gv0 force " the glusterfsd can be started but
"gluster volume start gv0" don't work.
Yes, there is a difference between volume start and volume start force.
When a volume is in "Started" state already, gluster volume start gv0 won't do
anything (meaning it doesn't bring up the dead bricks). When you say start force,
status of glusterfsd's is checked and the glusterfsd's not running are spawned.
Which is the case here in the setup you have.
The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
The file /var/lib/glusterd/vols/gv0/info on A node as below.
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem?
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is, I want to know if the glustershd is also the bin glusterfsd,
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after
rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running
"gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster
process status.
On restarting the node, glusterfs process for mount won't start by itself.
You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd --
self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon
does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal
<VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
Atin Mukherjee
2016-02-17 06:54:57 UTC
Permalink
Post by Atin Mukherjee
Post by songxin
Hi,
But I also don't know why glusterfsd can't be start by glusterd after B
node rebooted.The version of glusterfs on A node and B node are both
3.7.6. Can you explain this for me please?
Its because the GlusterD has failed to start on Node B. I've already
asked you in another mail to provide the delta of the gv0's info file to
get to the root cause.
Please ignore this mail as I didn't read your previous reply!
Post by Atin Mukherjee
Post by songxin
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Wednesday, February 17, 2016 11:44:14 AM
Subject: Re:Re: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC. Does it
cause the problem?
If I run " gluster volume start gv0 force " the glusterfsd can be started but
"gluster volume start gv0" don't work.
Yes, there is a difference between volume start and volume start force.
When a volume is in "Started" state already, gluster volume start gv0 won't do
anything (meaning it doesn't bring up the dead bricks). When you say start force,
status of glusterfsd's is checked and the glusterfsd's not running are spawned.
Which is the case here in the setup you have.
The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
The file /var/lib/glusterd/vols/gv0/info on A node as below.
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem?
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is, I want to know if the glustershd is also the bin glusterfsd,
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after
rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why
glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running
"gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster
process status.
On restarting the node, glusterfs process for mount won't start by itself.
You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd --
self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon
does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal
<VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Atin Mukherjee
2016-02-17 06:59:21 UTC
Permalink
Post by songxin
Hi,
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC.
Does it cause the problem?
If I run " gluster volume start gv0 force " the glusterfsd can be
started but "gluster volume start gv0" don't work.
The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
The file /var/lib/glusterd/vols/gv0/info on A node as below.
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
Contents look similar. But the log says different and that can'
t happen. Are you sure they are same? As a workaround can you delete the
same info file from the disk and restart glusterd instance and see
whether the problem persists?
Post by songxin
Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem?
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is, I want to know if the glustershd is also the bin glusterfsd,
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster process status.
On restarting the node, glusterfs process for mount won't start by itself. You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-17 12:44:13 UTC
Permalink
Do you mean that I will delete the info file on B node and then start the glusterd?Or copy it from A node to B node?

发自我的 iPhone
Post by Atin Mukherjee
Post by songxin
Hi拢卢
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC.
Does it cause the problem?
If I run " gluster volume start gv0 force " the glusterfsd can be
started but "gluster volume start gv0" don't work.
The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
The file /var/lib/glusterd/vols/gv0/info on A node as below.
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
Contents look similar. But the log says different and that can'
t happen. Are you sure they are same? As a workaround can you delete the
same info file from the disk and restart glusterd instance and see
whether the problem persists?
Post by songxin
Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem拢驴
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is拢卢 I want to know if the glustershd is also the bin glusterfsd拢卢
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster process status.
On restarting the node, glusterfs process for mount won't start by itself. You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Atin Mukherjee
2016-02-17 15:47:08 UTC
Permalink
Post by songxin
Do you mean that I will delete the info file on B node and then start the glusterd?Or copy it from A node to B node?
Any one of them and then a restart of GlusterD on B.
Post by songxin
发自我的 iPhone
Post by Atin Mukherjee
Post by songxin
Hi拢卢
The version of glusterfs on A node and B node are both 3.7.6.
The time on B node is same after rebooting because B node hasn't RTC.
Does it cause the problem?
If I run " gluster volume start gv0 force " the glusterfsd can be
started but "gluster volume start gv0" don't work.
The file /var/lib/glusterd/vols/gv0/info on B node as below.
...
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
The file /var/lib/glusterd/vols/gv0/info on A node as below.
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=2
transport-type=0
volume-id=c4197371-6d01-4477-8cb2-384cda569c27
username=62e009ea-47c4-46b4-8e74-47cd9c199d94
password=ef600dcd-42c5-48fc-8004-d13a3102616b
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
brick-0=128.224.162.255:-data-brick-gv0
brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
Contents look similar. But the log says different and that can'
t happen. Are you sure they are same? As a workaround can you delete the
same info file from the disk and restart glusterd instance and see
whether the problem persists?
Post by songxin
Thanks,
Xin
Post by Atin Mukherjee
Post by songxin
Hi,
Thank you for your immediate and detailed reply.And I have a few more
question about glusterfs.
A node IP is 128.224.162.163.
B node IP is 128.224.162.250.
1.After reboot B node and start the glusterd service the glusterd log is as blow.
...
[2015-12-07 07:54:55.743966] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-12-07 07:54:55.744026] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-12-07 07:54:55.744280] I [MSGID: 106163]
[glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30706
[2015-12-07 07:54:55.773606] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
[2015-12-07 07:54:55.777994] E [MSGID: 101076]
[common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
lookup hostname of 128.224.162.163 : Temporary failure in name resolution
[2015-12-07 07:54:55.778290] E [MSGID: 106010]
Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.
Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?
~Atin
Post by songxin
[2015-12-07 07:54:55.778384] I [MSGID: 106493]
Responded to 128.224.162.163 (0), ret: 0
[2015-12-07 07:54:55.928774] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
128.224.162.163, port: 0
...
When I run gluster peer status on B node it show as below.
Number of Peers: 1
Hostname: 128.224.162.163
Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
State: Peer Rejected (Connected)
When I run "gluster volume status" on A node it show as below.
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 128.224.162.163:/home/wrsadmin/work/t
mp/data/brick/gv0 49152 0 Y
13019
NFS Server on localhost N/A N/A N
N/A
Self-heal Daemon on localhost N/A N/A Y
13045
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
It looks like the glusterfsd service is ok on A node.
If because the peer state is Rejected so gluterd didn't start the
glusterfsd?What causes this problem拢驴
2. Is glustershd(self-heal-daemon) the process as below?
root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/gluster ..
If it is拢卢 I want to know if the glustershd is also the bin glusterfsd拢卢
just like glusterd and glusterfs.
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, February 16, 2016 3:59:50 PM
Subject: [Gluster-users] question about sync replicate volume after rebooting one node
Hi,
I have a question about how to sync volume between two bricks after one node
is reboot.
There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
is 128.124.10.2.
operation steps on A node as below
1. gluster peer probe 128.124.10.2
2. mkdir -p /data/brick/gv0
3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
128.124.10.2 :/data/brick/gv1 force
4. gluster volume start gv0
5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
operation steps on B node as below
1 . mkdir -p /data/brick/gv0
2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
After all steps above , there a some gluster service process, including
glusterd, glusterfs and glusterfsd, running on both A and B node.
I can see these servic by command ps aux | grep gluster and command gluster
volume status.
Now reboot the B node.After B reboot , there are no gluster service running
on B node.
After I systemctl start glusterd , there is just glusterd service but not
glusterfs and glusterfsd on B node.
Because glusterfs and glusterfsd are not running so I can't gluster volume
heal gv0 full.
I want to know why glusterd don't start glusterfs and glusterfsd.
On starting glusterd, glusterfsd should have started by itself.
Could you share glusterd and brick log (on node B) so that we know why glusterfsd
didn't start?
Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
on one of the nodes and check if all the brick processes started.
gluster volume status <VOLNAME> should be able to provide you with gluster process status.
On restarting the node, glusterfs process for mount won't start by itself. You will have to run
step 2 on node B again for it.
How do I restart these services on B node?
How do I sync the replicate volume after one node reboot?
Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Marcos Renato da Silva Junior
2016-02-19 18:50:24 UTC
Permalink
Hi,


One of my nodes show "gluster peer status" :


Number of Peers: 3

Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Peer in Cluster (Connected)

Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Establishing Connection (Connected)
Other names:
node1

Hostname: servpos4
Uuid: b712284c-5b42-4c0e-b67c-a908aa47bd3c
State: Peer in Cluster (Connected)


Same address but one "State: Peer in Cluster" another "Establishing
Connection (Connected)".

Its works ok.
--
Marcos Renato da Silva Junior
Universidade Estadual Paulista - Unesp
Faculdade de Engenharia de Ilha Solteira - FEIS
Departamento de Engenharia Elétrica
15385-000 - Ilha Solteira/SP
(18) 3743-1164
Atin Mukherjee
2016-02-22 03:36:12 UTC
Permalink
Could you attach glusterd.log along with cmd_history.log file of all the
nodes? Output of gluster volume status & gluster volume info would be
also be helpful here.

~Atin
Post by Marcos Renato da Silva Junior
Hi,
Number of Peers: 3
Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Peer in Cluster (Connected)
Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Establishing Connection (Connected)
node1
Hostname: servpos4
Uuid: b712284c-5b42-4c0e-b67c-a908aa47bd3c
State: Peer in Cluster (Connected)
Same address but one "State: Peer in Cluster" another "Establishing
Connection (Connected)".
Its works ok.
Marcos Renato da Silva Junior
2016-02-22 17:22:13 UTC
Permalink
Hi,

Solved using the link :

http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected

Thanks.
Post by Atin Mukherjee
Could you attach glusterd.log along with cmd_history.log file of all the
nodes? Output of gluster volume status & gluster volume info would be
also be helpful here.
~Atin
Post by Marcos Renato da Silva Junior
Hi,
Number of Peers: 3
Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Peer in Cluster (Connected)
Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Establishing Connection (Connected)
node1
Hostname: servpos4
Uuid: b712284c-5b42-4c0e-b67c-a908aa47bd3c
State: Peer in Cluster (Connected)
Same address but one "State: Peer in Cluster" another "Establishing
Connection (Connected)".
Its works ok.
--
Marcos Renato da Silva Junior
Universidade Estadual Paulista - Unesp
Faculdade de Engenharia de Ilha Solteira - FEIS
Departamento de Engenharia Elétrica
15385-000 - Ilha Solteira/SP
(18) 3743-1164
Atin Mukherjee
2016-02-22 17:32:07 UTC
Permalink
-Atin
Sent from one plus one
On 22-Feb-2016 10:52 pm, "Marcos Renato da Silva Junior" <
Post by Marcos Renato da Silva Junior
Hi,
http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected
Great to know that, however I was wondering how did you end up in that
state :)
Post by Marcos Renato da Silva Junior
Thanks.
Post by Atin Mukherjee
Could you attach glusterd.log along with cmd_history.log file of all the
nodes? Output of gluster volume status & gluster volume info would be
also be helpful here.
~Atin
Post by Marcos Renato da Silva Junior
Hi,
Number of Peers: 3
Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Peer in Cluster (Connected)
Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Establishing Connection (Connected)
node1
Hostname: servpos4
Uuid: b712284c-5b42-4c0e-b67c-a908aa47bd3c
State: Peer in Cluster (Connected)
Same address but one "State: Peer in Cluster" another "Establishing
Connection (Connected)".
Its works ok.
--
Marcos Renato da Silva Junior
Universidade Estadual Paulista - Unesp
Faculdade de Engenharia de Ilha Solteira - FEIS
Departamento de Engenharia Elétrica
15385-000 - Ilha Solteira/SP
(18) 3743-1164
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Atin Mukherjee
2016-02-19 14:17:18 UTC
Permalink
Abhilash has already raised a concern and Gaurav is looking into it.

-Atin
Sent from one plus one
Hi,
I create a replicate volume with 2 brick.And I frequently reboot my two nodes and frequently run “peer detach” “peer detach” “add-brick” "remove-brick".
A borad ip: 10.32.0.48
B borad ip: 10.32.1.144
After that, I run "gluster peer status" on A board and it show as below.
Number of Peers: 2
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
I don't understand why the 10.32.0.48 has two peers which are both 10.32.1.144.
Does glusterd not check duplicate ip addr
Any can help me to answer my quesion?
Thanks
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Atin Mukherjee
2016-02-19 14:18:32 UTC
Permalink
-Atin
Sent from one plus one
Post by Atin Mukherjee
Abhilash has already raised a concern and Gaurav is looking into it.
My bad, he is Abhishek!
Post by Atin Mukherjee
-Atin
Sent from one plus one
Hi,
I create a replicate volume with 2 brick.And I frequently reboot my two
nodes and frequently run “peer detach” “peer detach” “add-brick”
"remove-brick".
Post by Atin Mukherjee
A borad ip: 10.32.0.48
B borad ip: 10.32.1.144
After that, I run "gluster peer status" on A board and it show as below.
Number of Peers: 2
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
I don't understand why the 10.32.0.48 has two peers which are both 10.32.1.144.
Does glusterd not check duplicate ip addr
Any can help me to answer my quesion?
Thanks
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-19 23:26:12 UTC
Permalink
Hi Gaurav,
Thank you for your reply. I will do these test as you said.
I face this issue on glusterd version is 3.7.6. Do you know if this issue has been fixed on latest version 3.7.8.

Thanks,
Xin

发自我的 iPhone
Hi xin,
Thanks for bringing up your Gluster issue.
Abhishek (another Gluster community member) also faced the same issue. I asked below things for futher analysing this issue. could you provide me following information?
Did you perform any manual operation with GlusterFS configuration file which resides in /var/lib/glusterd/* folder.?
Can you provide output of "ls /var/lib/glusterd/peers" from both of your nodes.
Can you provide output of #gluster volume info command
Could you provide output of #gluster peer status command when 2nd node is down
Down the glusterd on both node and bring glusterd one by one on both node and provide me output of #gluster peer status command
Can you provide full logs details of cmd_history.log and etc-glusterfs-glusterd.vol.log from both the nodes.
following things will be very useful for analysing this issue.
You can restart your glusterd as of now as a workaround but we need to analysis this issue further.
Thanks,
~Gaurav
----- Original Message -----
Sent: Friday, February 19, 2016 7:07:48 PM
Subject: [Gluster-users] two same ip addr in peer list
Hi,
I create a replicate volume with 2 brick.And I frequently reboot my two nodes and frequently run “peer detach” “peer detach” “add-brick” "remove-brick".
A borad ip: 10.32.0.48
B borad ip: 10.32.1.144
After that, I run "gluster peer status" on A board and it show as below.
Number of Peers: 2
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
I don't understand why the 10.32.0.48 has two peers which are both 10.32.1.144.
Does glusterd not check duplicate ip addr?
Any can help me to answer my quesion?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Gaurav Garg
2016-02-20 02:50:36 UTC
Permalink
Hi Xin,

I didn't heard about this issue in Gluster V 3.7.6/3.7.8 or any other version. After checking all logs i can say that whether its a issue or something else.

Thanks,
Gaurav

----- Original Message -----
From: "songxin" <***@126.com>
To: "Gaurav Garg" <***@redhat.com>
Cc: gluster-***@gluster.org
Sent: Saturday, February 20, 2016 4:56:12 AM
Subject: Re: [Gluster-users] two same ip addr in peer list

Hi Gaurav,
Thank you for your reply. I will do these test as you said.
I face this issue on glusterd version is 3.7.6. Do you know if this issue has been fixed on latest version 3.7.8.

Thanks,
Xin

发自我的 iPhone
Hi xin,
Thanks for bringing up your Gluster issue.
Abhishek (another Gluster community member) also faced the same issue. I asked below things for futher analysing this issue. could you provide me following information?
Did you perform any manual operation with GlusterFS configuration file which resides in /var/lib/glusterd/* folder.?
Can you provide output of "ls /var/lib/glusterd/peers" from both of your nodes.
Can you provide output of #gluster volume info command
Could you provide output of #gluster peer status command when 2nd node is down
Down the glusterd on both node and bring glusterd one by one on both node and provide me output of #gluster peer status command
Can you provide full logs details of cmd_history.log and etc-glusterfs-glusterd.vol.log from both the nodes.
following things will be very useful for analysing this issue.
You can restart your glusterd as of now as a workaround but we need to analysis this issue further.
Thanks,
~Gaurav
----- Original Message -----
Sent: Friday, February 19, 2016 7:07:48 PM
Subject: [Gluster-users] two same ip addr in peer list
Hi,
I create a replicate volume with 2 brick.And I frequently reboot my two nodes and frequently run “peer detach” “peer detach” “add-brick” "remove-brick".
A borad ip: 10.32.0.48
B borad ip: 10.32.1.144
After that, I run "gluster peer status" on A board and it show as below.
Number of Peers: 2
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
Hostname: 10.32.1.144
Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
State: Peer in Cluster (Connected)
I don't understand why the 10.32.0.48 has two peers which are both 10.32.1.144.
Does glusterd not check duplicate ip addr?
Any can help me to answer my quesion?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-24 01:46:02 UTC
Permalink
Hi all,
I have a question about replicate volume as below.


precondition£º
1.A node ip: 128.224.162.163
2.B node ip:128.224.162.255
3.A node brick:/data/brick/gv0
4.B node brick:/data/brick/gv0


reproduce step:
1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7.create some file(d,e,f) in directory gluster //run on A node
8.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on B node
9.ls gluster //run on B node


My question is as below.


After step 6, the volume type is change from distribute to replicate.
The file (a,b,c) is created when the volume type is distribute.
The file (d,e,f) is created when the volume type is replicate.


After step 6, does the volume will replicate the file (a,b,c) in two brick?Or it just replicate the file(d,e,f) in two brick?
If I run "gluster volume heal gv0 full", does the volume will replicate the file (a,b,c) in two brick?


Thanks,
Xin
Gaurav Garg
2016-02-24 03:58:05 UTC
Permalink
Hi songxin,

please find comment inline.

----- Original Message -----
From: "songxin" <***@126.com>
To: gluster-***@gluster.org
Cc: gluster-***@gluster.org
Sent: Wednesday, February 24, 2016 7:16:02 AM
Subject: [Gluster-users] question about replicate volume

Hi all,
I have a question about replicate volume as below.

precondition:
1.A node ip: 128.224.162.163
2.B node ip:128.224.162.255
3.A node brick:/data/brick/gv0
4. B node brick:/data/brick/gv0

reproduce step:
1. gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163 : /data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4. mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6. gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7. create some file(d,e,f) in directory gluster //run on A node
8. mount -t glusterfs 128.224.162.163:/gv0 gluster //run on B node
9.ls gluster //run on B node

My question is as below.

After step 6, the volume type is change from distribute to replicate.
The file (a,b,c) is created when the volume type is distribute.
The file (d,e,f) is created when the volume type is replicate.
Post by songxin
After step 6, does the volume will replicate the file (a,b,c) in two brick?Or it just replicate the file(d,e,f) in two brick?
If I run "gluster volume heal gv0 full", does the volume will replicate the file (a,b,c) in two brick?


After step 6 volume have converted to replicate volume. So if you create file from mount point it will replicate these file to all replica set. In your case after step 6 it will replicate only file(d,e,f) because before step 6 volume was distributed. For replicating all the file (before step 6) you need to run #gluster volume heal <volname> full. After executing this command file in both replica set should be same.

Thanks,

~Gaurav

Thanks,
Xin






_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
Joe Julian
2016-02-24 04:00:18 UTC
Permalink
Post by Gaurav Garg
Hi songxin,
please find comment inline.
----- Original Message -----
Sent: Wednesday, February 24, 2016 7:16:02 AM
Subject: [Gluster-users] question about replicate volume
Hi all,
I have a question about replicate volume as below.
precondition:
1.A node ip: 128.224.162.163
2.B node ip:128.224.162.255
3.A node brick:/data/brick/gv0
4. B node brick:/data/brick/gv0
1. gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163 : /data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4. mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6. gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7. create some file(d,e,f) in directory gluster //run on A node
8. mount -t glusterfs 128.224.162.163:/gv0 gluster //run on B node
9.ls gluster //run on B node
My question is as below.
After step 6, the volume type is change from distribute to replicate.
The file (a,b,c) is created when the volume type is distribute.
The file (d,e,f) is created when the volume type is replicate.
Post by songxin
After step 6, does the volume will replicate the file (a,b,c) in two brick?Or it just replicate the file(d,e,f) in two brick?
If I run "gluster volume heal gv0 full", does the volume will replicate the file (a,b,c) in two brick?
After step 6 volume have converted to replicate volume. So if you create file from mount point it will replicate these file to all replica set. In your case after step 6 it will replicate only file(d,e,f) because before step 6 volume was distributed. For replicating all the file (before step 6) you need to run #gluster volume heal <volname> full. After executing this command file in both replica set should be same.
Did that change? It used to trigger the heal crawl automatically when
you changed the replica count.
Post by Gaurav Garg
Thanks,
~Gaurav
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2016-02-24 04:21:00 UTC
Permalink
Post by Joe Julian
Post by Gaurav Garg
After step 6 volume have converted to replicate volume. So if you
create file from mount point it will replicate these file to all
replica set. In your case after step 6 it will replicate only
file(d,e,f) because before step 6 volume was distributed. For
replicating all the file (before step 6) you need to run #gluster
volume heal <volname> full. After executing this command file in both
replica set should be same.
Did that change? It used to trigger the heal crawl automatically when
you changed the replica count.
You needed to manually run full heal for both add-brick and
replace-brick scenarios even in afr-v1. In afr-v2 there is a bug (BZ
1112158) in heal-full but Anuradha's patches are aimed at doing away
with running full heal in both these scenarios as noted in the last 2
comments of the BZ.
Post by Joe Julian
Post by Gaurav Garg
Thanks,
songxin
2016-02-25 10:18:30 UTC
Permalink
Hi,
I have a problem as below when I start the gluster after reboot a board.

precondition:
I use two boards do this test.
The version of glusterfs is 3.7.6.

A board ip:128.224.162.255
B board ip:128.224.95.140


reproduce steps£º

1.systemctl start glusterd (A board)
2.systemctl start glusterd (B board)
3.gluster peer probe 128.224.95.140 (A board)
4.gluster volume create gv0 replica 2 128.224.95.140:/tmp/brick1/gv0 128.224.162.255:/data/brick/gv0 force (local board)
5.gluster volume start gv0 (A board)
6.press the reset button on the A board.It is a develop board so it has a reset button that is similar to reset button on pc (A board)
7.run command "systemctl start glusterd" after A board reboot. And command failed because the file /var/lib/glusterd/snaps/.nfsxxxxxxxxx (local board) .
Log is as below.
[2015-12-07 07:55:38.260084] E [MSGID: 101032] [store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/snaps/.nfs0000000001722f4000000002
[2015-12-07 07:55:38.260120] D [MSGID: 0] [store.c:439:gf_store_handle_retrieve] 0-: Returning -1
[2015-12-07 07:55:38.260152] E [MSGID: 106200] [glusterd-store.c:3332:glusterd_store_update_snap] 0-management: snap handle is NULL
[2015-12-07 07:55:38.260180] E [MSGID: 106196] [glusterd-store.c:3427:glusterd_store_retrieve_snap] 0-management: Failed to update snapshot for .nfs0000000001722f40
[2015-12-07 07:55:38.260208] E [MSGID: 106043] [glusterd-store.c:3589:glusterd_store_retrieve_snaps] 0-management: Unable to restore snapshot: .nfs0000000001722f400
[2015-12-07 07:55:38.260241] D [MSGID: 0] [glusterd-store.c:3607:glusterd_store_retrieve_snaps] 0-management: Returning with -1
[2015-12-07 07:55:38.260268] D [MSGID: 0] [glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1
[2015-12-07 07:55:38.260325] E [MSGID: 101019] [xlator.c:428:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2015-12-07 07:55:38.260355] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed
[2015-12-07 07:55:38.260374] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed

8.rm /var/lib/glusterd/snaps/.nfsxxxxxxxxx (A board)
9..run command "systemctl start glusterd" and success.
10.at this point the peer status is Peer in Cluster (Connected) and all process is online.

If a node abnormal reset, must I remove the /var/lib/glusterd/snaps/.nfsxxxxxx before starting the glusterd?

I want to know if it is nomal.


Thanks,
Xin
Atin Mukherjee
2016-02-25 10:25:50 UTC
Permalink
I believe you and Abhishek are from the same group and sharing the
common set up. Could you check the content of /var/lib/glusterd/* in
board B (post reboot and before starting glusterd) matches with
/var/lib/glusterd/* from board A?

~Atin
Post by songxin
Hi,
I have a problem as below when I start the gluster after reboot a board.
I use two boards do this test.
The version of glusterfs is 3.7.6.
A board ip:128.224.162.255
B board ip:128.224.95.140
reproduce steps:
1.systemctl start glusterd (A board)
2.systemctl start glusterd (B board)
3.gluster peer probe 128.224.95.140 (A board)
4.gluster volume create gv0 replica 2 128.224.95.140:/tmp/brick1/gv0
128.224.162.255:/data/brick/gv0 force (local board)
5.gluster volume start gv0 (A board)
6.press the reset button on the A board.It is a develop board so it has
a reset button that is similar to reset button on pc (A board)
7.run command "systemctl start glusterd" after A board reboot. And
command failed because the file /var/lib/glusterd/snaps/.nfsxxxxxxxxx
(local board) .
Log is as below.
[2015-12-07 07:55:38.260084] E [MSGID: 101032]
[store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/snaps/.nfs0000000001722f4000000002
[2015-12-07 07:55:38.260120] D [MSGID: 0]
[store.c:439:gf_store_handle_retrieve] 0-: Returning -1
[2015-12-07 07:55:38.260152] E [MSGID: 106200]
[glusterd-store.c:3332:glusterd_store_update_snap] 0-management: snap
handle is NULL
[2015-12-07 07:55:38.260180] E [MSGID: 106196]
Failed to update snapshot for .nfs0000000001722f40
[2015-12-07 07:55:38.260208] E [MSGID: 106043]
Unable to restore snapshot: .nfs0000000001722f400
[2015-12-07 07:55:38.260241] D [MSGID: 0]
Returning with -1
[2015-12-07 07:55:38.260268] D [MSGID: 0]
[glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1
[2015-12-07 07:55:38.260325] E [MSGID: 101019]
[xlator.c:428:xlator_init] 0-management: Initialization of volume
'management' failed, review your volfile again
[2015-12-07 07:55:38.260355] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2015-12-07 07:55:38.260374] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
8.rm /var/lib/glusterd/snaps/.nfsxxxxxxxxx (A board)
9..run command "systemctl start glusterd" and success.
10.at this point the peer status is Peer in Cluster (Connected) and all process is online.
If a node abnormal reset, must I remove
the /var/lib/glusterd/snaps/.nfsxxxxxx before starting the glusterd?
I want to know if it is nomal.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-25 10:42:10 UTC
Permalink
Thanks for your reply.


Do I need check all files in /var/lib/glusterd/*?
Must all files be same in A node and B node?


I found that the size of file /var/lib/glusterd/snaps/.nfs0000000001722f4000000002 is 0 bytes after A board reboot.
So glusterd can't restore by this snap file on A node.
Is it right?
Post by Atin Mukherjee
I believe you and Abhishek are from the same group and sharing the
common set up. Could you check the content of /var/lib/glusterd/* in
board B (post reboot and before starting glusterd) matches with
/var/lib/glusterd/* from board A?
~Atin
Post by songxin
Hi,
I have a problem as below when I start the gluster after reboot a board.
I use two boards do this test.
The version of glusterfs is 3.7.6.
A board ip:128.224.162.255
B board ip:128.224.95.140
reproduce steps£º
1.systemctl start glusterd (A board)
2.systemctl start glusterd (B board)
3.gluster peer probe 128.224.95.140 (A board)
4.gluster volume create gv0 replica 2 128.224.95.140:/tmp/brick1/gv0
128.224.162.255:/data/brick/gv0 force (local board)
5.gluster volume start gv0 (A board)
6.press the reset button on the A board.It is a develop board so it has
a reset button that is similar to reset button on pc (A board)
7.run command "systemctl start glusterd" after A board reboot. And
command failed because the file /var/lib/glusterd/snaps/.nfsxxxxxxxxx
(local board) .
Log is as below.
[2015-12-07 07:55:38.260084] E [MSGID: 101032]
[store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/snaps/.nfs0000000001722f4000000002
[2015-12-07 07:55:38.260120] D [MSGID: 0]
[store.c:439:gf_store_handle_retrieve] 0-: Returning -1
[2015-12-07 07:55:38.260152] E [MSGID: 106200]
[glusterd-store.c:3332:glusterd_store_update_snap] 0-management: snap
handle is NULL
[2015-12-07 07:55:38.260180] E [MSGID: 106196]
Failed to update snapshot for .nfs0000000001722f40
[2015-12-07 07:55:38.260208] E [MSGID: 106043]
Unable to restore snapshot: .nfs0000000001722f400
[2015-12-07 07:55:38.260241] D [MSGID: 0]
Returning with -1
[2015-12-07 07:55:38.260268] D [MSGID: 0]
[glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1
[2015-12-07 07:55:38.260325] E [MSGID: 101019]
[xlator.c:428:xlator_init] 0-management: Initialization of volume
'management' failed, review your volfile again
[2015-12-07 07:55:38.260355] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2015-12-07 07:55:38.260374] E [graph.c:661:glusterfs_graph_activate]
0-graph: init failed
8.rm /var/lib/glusterd/snaps/.nfsxxxxxxxxx (A board)
9..run command "systemctl start glusterd" and success.
10.at this point the peer status is Peer in Cluster (Connected) and all
process is online.
If a node abnormal reset, must I remove
the /var/lib/glusterd/snaps/.nfsxxxxxx before starting the glusterd?
I want to know if it is nomal.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Atin Mukherjee
2016-02-25 11:05:02 UTC
Permalink
+ Rajesh , Avra
Post by songxin
Thanks for your reply.
Do I need check all files in /var/lib/glusterd/*?
Must all files be same in A node and B node?
Yes, they should be identical.
Post by songxin
I found that the size of
file /var/lib/glusterd/snaps/.nfs0000000001722f4000000002 is 0 bytes
after A board reboot.
So glusterd can't restore by this snap file on A node.
Is it right?
Yes, looks like that.
Post by songxin
Post by Atin Mukherjee
I believe you and Abhishek are from the same group and sharing the
common set up. Could you check the content of /var/lib/glusterd/* in
board B (post reboot and before starting glusterd) matches with
/var/lib/glusterd/* from board A?
~Atin
Post by songxin
Hi,
I have a problem as below when I start the gluster after reboot a board.
I use two boards do this test.
The version of glusterfs is 3.7.6.
A board ip:128.224.162.255
B board ip:128.224.95.140
reproduce steps:
1.systemctl start glusterd (A board)
2.systemctl start glusterd (B board)
3.gluster peer probe 128.224.95.140 (A board)
4.gluster volume create gv0 replica 2 128.224.95.140:/tmp/brick1/gv0
128.224.162.255:/data/brick/gv0 force (local board)
5.gluster volume start gv0 (A board)
6.press the reset button on the A board.It is a develop board so it has
a reset button that is similar to reset button on pc (A board)
7.run command "systemctl start glusterd" after A board reboot. And
command failed because the file /var/lib/glusterd/snaps/.nfsxxxxxxxxx
(local board) .
Log is as below.
[2015-12-07 07:55:38.260084] E [MSGID: 101032]
[store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/snaps/.nfs0000000001722f4000000002
[2015-12-07 07:55:38.260120] D [MSGID: 0]
[store.c:439:gf_store_handle_retrieve] 0-: Returning -1
[2015-12-07 07:55:38.260152] E [MSGID: 106200]
[glusterd-store.c:3332:glusterd_store_update_snap] 0-management: snap
handle is NULL
[2015-12-07 07:55:38.260180] E [MSGID: 106196]
Failed to update snapshot for .nfs0000000001722f40
[2015-12-07 07:55:38.260208] E [MSGID: 106043]
Unable to restore snapshot: .nfs0000000001722f400
[2015-12-07 07:55:38.260241] D [MSGID: 0]
Returning with -1
[2015-12-07 07:55:38.260268] D [MSGID: 0]
[glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1
[2015-12-07 07:55:38.260325] E [MSGID: 101019]
[xlator.c:428:xlator_init] 0-management: Initialization of volume
'management' failed, review your volfile again
[2015-12-07 07:55:38.260355] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2015-12-07 07:55:38.260374] E [graph.c:661:glusterfs_graph_activate]
0-graph: init failed
8.rm /var/lib/glusterd/snaps/.nfsxxxxxxxxx (A board)
9..run command "systemctl start glusterd" and success.
10.at this point the peer status is Peer in Cluster (Connected) and all
process is online.
If a node abnormal reset, must I remove
the /var/lib/glusterd/snaps/.nfsxxxxxx before starting the glusterd?
I want to know if it is nomal.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-25 11:40:25 UTC
Permalink
If I run "reboot" on the a node,there are not .snap files on A node after reboot.
Does the snap file only appear after unexpect reboot?
Why its size is 0 byte?
In this situation ,is a right method to solve this problem removing the snap file?

thanks
xin

发自我的 iPhone
Post by Atin Mukherjee
+ Rajesh , Avra
Post by songxin
Thanks for your reply.
Do I need check all files in /var/lib/glusterd/*?
Must all files be same in A node and B node?
Yes, they should be identical.
Post by songxin
I found that the size of
file /var/lib/glusterd/snaps/.nfs0000000001722f4000000002 is 0 bytes
after A board reboot.
So glusterd can't restore by this snap file on A node.
Is it right?
Yes, looks like that.
Post by songxin
Post by Atin Mukherjee
I believe you and Abhishek are from the same group and sharing the
common set up. Could you check the content of /var/lib/glusterd/* in
board B (post reboot and before starting glusterd) matches with
/var/lib/glusterd/* from board A?
~Atin
Post by songxin
Hi,
I have a problem as below when I start the gluster after reboot a board.
I use two boards do this test.
The version of glusterfs is 3.7.6.
A board ip:128.224.162.255
B board ip:128.224.95.140
reproduce steps拢潞
1.systemctl start glusterd (A board)
2.systemctl start glusterd (B board)
3.gluster peer probe 128.224.95.140 (A board)
4.gluster volume create gv0 replica 2 128.224.95.140:/tmp/brick1/gv0
128.224.162.255:/data/brick/gv0 force (local board)
5.gluster volume start gv0 (A board)
6.press the reset button on the A board.It is a develop board so it has
a reset button that is similar to reset button on pc (A board)
7.run command "systemctl start glusterd" after A board reboot. And
command failed because the file /var/lib/glusterd/snaps/.nfsxxxxxxxxx
(local board) .
Log is as below.
[2015-12-07 07:55:38.260084] E [MSGID: 101032]
[store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/snaps/.nfs0000000001722f4000000002
[2015-12-07 07:55:38.260120] D [MSGID: 0]
[store.c:439:gf_store_handle_retrieve] 0-: Returning -1
[2015-12-07 07:55:38.260152] E [MSGID: 106200]
[glusterd-store.c:3332:glusterd_store_update_snap] 0-management: snap
handle is NULL
[2015-12-07 07:55:38.260180] E [MSGID: 106196]
Failed to update snapshot for .nfs0000000001722f40
[2015-12-07 07:55:38.260208] E [MSGID: 106043]
Unable to restore snapshot: .nfs0000000001722f400
[2015-12-07 07:55:38.260241] D [MSGID: 0]
Returning with -1
[2015-12-07 07:55:38.260268] D [MSGID: 0]
[glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1
[2015-12-07 07:55:38.260325] E [MSGID: 101019]
[xlator.c:428:xlator_init] 0-management: Initialization of volume
'management' failed, review your volfile again
[2015-12-07 07:55:38.260355] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2015-12-07 07:55:38.260374] E [graph.c:661:glusterfs_graph_activate]
0-graph: init failed
8.rm /var/lib/glusterd/snaps/.nfsxxxxxxxxx (A board)
9..run command "systemctl start glusterd" and success.
10.at this point the peer status is Peer in Cluster (Connected) and all
process is online.
If a node abnormal reset, must I remove
the /var/lib/glusterd/snaps/.nfsxxxxxx before starting the glusterd?
I want to know if it is nomal.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Avra Sengupta
2016-02-25 11:46:40 UTC
Permalink
Hi,

/var/lib/glusterd/snaps/ contains only 1 file called missed_snaps_list. Apart from it, there are only directories created with
the snap names. Is .nfs0000000001722f4000000002, that you saw in /var/lib/glusterd a file or a directory. If it's a file, then it
was not placed there as part of snapshotting any volume. If it's a directory, then did you try creating a snapshot with such a name.

Regards,
Avra
If I run "reboot" on the a node��there are not .snap files on A node after reboot.
Does the snap file only appear after unexpect reboot��
Why its size is 0 byte��
In this situation ��is a right method to solve this problem removing the snap file?
thanks
xin
�����ҵ� iPhone
Post by Atin Mukherjee
+ Rajesh , Avra
Post by songxin
Thanks for your reply.
Do I need check all files in /var/lib/glusterd/*?
Must all files be same in A node and B node?
Yes, they should be identical.
Post by songxin
I found that the size of
file /var/lib/glusterd/snaps/.nfs0000000001722f4000000002 is 0 bytes
after A board reboot.
So glusterd can't restore by this snap file on A node.
Is it right?
Yes, looks like that.
Post by songxin
Post by Atin Mukherjee
I believe you and Abhishek are from the same group and sharing the
common set up. Could you check the content of /var/lib/glusterd/* in
board B (post reboot and before starting glusterd) matches with
/var/lib/glusterd/* from board A?
~Atin
Post by songxin
Hi,
I have a problem as below when I start the gluster after reboot a board.
I use two boards do this test.
The version of glusterfs is 3.7.6.
A board ip:128.224.162.255
B board ip:128.224.95.140
reproduce steps£º
1.systemctl start glusterd (A board)
2.systemctl start glusterd (B board)
3.gluster peer probe 128.224.95.140 (A board)
4.gluster volume create gv0 replica 2 128.224.95.140:/tmp/brick1/gv0
128.224.162.255:/data/brick/gv0 force (local board)
5.gluster volume start gv0 (A board)
6.press the reset button on the A board.It is a develop board so it has
a reset button that is similar to reset button on pc (A board)
7.run command "systemctl start glusterd" after A board reboot. And
command failed because the file /var/lib/glusterd/snaps/.nfsxxxxxxxxx
(local board) .
Log is as below.
[2015-12-07 07:55:38.260084] E [MSGID: 101032]
[store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/snaps/.nfs0000000001722f4000000002
[2015-12-07 07:55:38.260120] D [MSGID: 0]
[store.c:439:gf_store_handle_retrieve] 0-: Returning -1
[2015-12-07 07:55:38.260152] E [MSGID: 106200]
[glusterd-store.c:3332:glusterd_store_update_snap] 0-management: snap
handle is NULL
[2015-12-07 07:55:38.260180] E [MSGID: 106196]
Failed to update snapshot for .nfs0000000001722f40
[2015-12-07 07:55:38.260208] E [MSGID: 106043]
Unable to restore snapshot: .nfs0000000001722f400
[2015-12-07 07:55:38.260241] D [MSGID: 0]
Returning with -1
[2015-12-07 07:55:38.260268] D [MSGID: 0]
[glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1
[2015-12-07 07:55:38.260325] E [MSGID: 101019]
[xlator.c:428:xlator_init] 0-management: Initialization of volume
'management' failed, review your volfile again
[2015-12-07 07:55:38.260355] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2015-12-07 07:55:38.260374] E [graph.c:661:glusterfs_graph_activate]
0-graph: init failed
8.rm /var/lib/glusterd/snaps/.nfsxxxxxxxxx (A board)
9..run command "systemctl start glusterd" and success.
10.at this point the peer status is Peer in Cluster (Connected) and all
process is online.
If a node abnormal reset, must I remove
the /var/lib/glusterd/snaps/.nfsxxxxxx before starting the glusterd?
I want to know if it is nomal.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-29 02:30:25 UTC
Permalink
Hi,
Thank you for reply.


The file .nfs0000000001722f4000000002 is a file but not a directory.
And I found that it is there when the gluster is running.


***@128:/# ls -al /var/lib/glusterd/snaps/ total 8 drwxr-xr-x 2 root root 4096 Feb 26 2016 . drwxr-xr-x 14 root root 4096 Feb 26 2016 .. -rw------- 1 root root 0 Feb 26 2016 .nfs0000000006c02b090000000c -rw------- 1 root root 0 Feb 26 2016 missed_snaps_list


If I abnormal shutdown the my board the file will be there when the board reboot.And some error will happened as below.
...
316] E [MSGID: 106043] [glusterd-store.c:3589:glusterd_store_retrieve_snaps] 0-management: Unable to restore snapshot: .nfs0000000001722f2c00000001 348] D [MSGID: 0] [glusterd-store.c:3607:glusterd_store_retrieve_snaps] 0-management: Returning with -1 389] D [MSGID: 0] [glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1 469] E [MSGID: 101019] [xlator.c:428:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again 501] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed
...


Do you know why is that


Thanks,
Xin
Post by Avra Sengupta
Hi,
/var/lib/glusterd/snaps/ contains only 1 file called missed_snaps_list. Apart from it, there are only directories created with
the snap names. Is .nfs0000000001722f4000000002, that you saw in /var/lib/glusterd a file or a directory. If it's a file, then it
was not placed there as part of snapshotting any volume. If it's a directory, then did you try creating a snapshot with such a name.
Regards,
Avra
If I run "reboot" on the a nodeᅵᅵthere are not .snap files on A node after reboot.
Does the snap file only appear after unexpect rebootᅵᅵ
Why its size is 0 byteᅵᅵ
In this situation ᅵᅵis a right method to solve this problem removing the snap file?
thanks
xin
ᅵᅵᅵᅵᅵҵᅵ iPhone
Post by Atin Mukherjee
+ Rajesh , Avra
Post by songxin
Thanks for your reply.
Do I need check all files in /var/lib/glusterd/*?
Must all files be same in A node and B node?
Yes, they should be identical.
Post by songxin
I found that the size of
file /var/lib/glusterd/snaps/.nfs0000000001722f4000000002 is 0 bytes
after A board reboot.
So glusterd can't restore by this snap file on A node.
Is it right?
Yes, looks like that.
Post by songxin
Post by Atin Mukherjee
I believe you and Abhishek are from the same group and sharing the
common set up. Could you check the content of /var/lib/glusterd/* in
board B (post reboot and before starting glusterd) matches with
/var/lib/glusterd/* from board A?
~Atin
Post by songxin
Hi,
I have a problem as below when I start the gluster after reboot a board.
I use two boards do this test.
The version of glusterfs is 3.7.6.
A board ip:128.224.162.255
B board ip:128.224.95.140
reproduce steps£º
1.systemctl start glusterd (A board)
2.systemctl start glusterd (B board)
3.gluster peer probe 128.224.95.140 (A board)
4.gluster volume create gv0 replica 2 128.224.95.140:/tmp/brick1/gv0
128.224.162.255:/data/brick/gv0 force (local board)
5.gluster volume start gv0 (A board)
6.press the reset button on the A board.It is a develop board so it has
a reset button that is similar to reset button on pc (A board)
7.run command "systemctl start glusterd" after A board reboot. And
command failed because the file /var/lib/glusterd/snaps/.nfsxxxxxxxxx
(local board) .
Log is as below.
[2015-12-07 07:55:38.260084] E [MSGID: 101032]
[store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/snaps/.nfs0000000001722f4000000002
[2015-12-07 07:55:38.260120] D [MSGID: 0]
[store.c:439:gf_store_handle_retrieve] 0-: Returning -1
[2015-12-07 07:55:38.260152] E [MSGID: 106200]
[glusterd-store.c:3332:glusterd_store_update_snap] 0-management: snap
handle is NULL
[2015-12-07 07:55:38.260180] E [MSGID: 106196]
Failed to update snapshot for .nfs0000000001722f40
[2015-12-07 07:55:38.260208] E [MSGID: 106043]
Unable to restore snapshot: .nfs0000000001722f400
[2015-12-07 07:55:38.260241] D [MSGID: 0]
Returning with -1
[2015-12-07 07:55:38.260268] D [MSGID: 0]
[glusterd-store.c:4339:glusterd_restore] 0-management: Returning -1
[2015-12-07 07:55:38.260325] E [MSGID: 101019]
[xlator.c:428:xlator_init] 0-management: Initialization of volume
'management' failed, review your volfile again
[2015-12-07 07:55:38.260355] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2015-12-07 07:55:38.260374] E [graph.c:661:glusterfs_graph_activate]
0-graph: init failed
8.rm /var/lib/glusterd/snaps/.nfsxxxxxxxxx (A board)
9..run command "systemctl start glusterd" and success.
10.at this point the peer status is Peer in Cluster (Connected) and all
process is online.
If a node abnormal reset, must I remove
the /var/lib/glusterd/snaps/.nfsxxxxxx before starting the glusterd?
I want to know if it is nomal.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-03-01 13:49:23 UTC
Permalink
Hi,


recondition:
A node:128.224.95.140
B node:128.224.162.255


brick on A node:/data/brick/gv0
brick on B node:/data/brick/gv0




reproduce steps:
1.gluster peer probe 128.224.162.255 (on A node)
2.gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0 128.224.162.255:/data/brick/gv0 force (on A node)
3.gluster volume start gv0 (on A node)
4.mount -t glusterfs 128.224.95.140:/gv0 gluster (on A node)
5.create some files(a,b,c) in dir gluster (on A node)
6.shutdown the B node
7.change the files(a,b,c) in dir gluster (on A node)
8.reboot B node
9.start glusterd on B node but glusterfsd is offline (on B node)
10.gluster volume remove-brick gv0 replica 1 128.224.162.255:/data/brick/gv0 force (on A node)
11.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force (on A node)


Now the files are not same between two brick


12."gluster volume heal gv0 info " show entry num is 0 (on A node)


Now What I should do if I want to sync file(a,b,c) on two brick?


I know the "heal full" can work , but I think the command take too long time.


So I run "tail -n 1 file" to all file on A node, but some files are sync but some files are not.


My question is below:
1.Why the tail can't sync all files?
2.Can the command "tail -n 1 filename" trigger selfheal, just like "ls -l filename"?


Thanks,
Xin
Anuradha Talur
2016-03-02 07:22:35 UTC
Permalink
----- Original Message -----
Sent: Tuesday, March 1, 2016 7:19:23 PM
Subject: [Gluster-users] about tail command
Hi,
A node:128.224.95.140
B node:128.224.162.255
brick on A node:/data/brick/gv0
brick on B node:/data/brick/gv0
1.gluster peer probe 128.224.162.255 (on A node)
2. gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0
128.224.162.255:/data/brick/gv0 force (on A node)
3.gluster volume start gv0 (on A node)
4. mount -t glusterfs 128.224.95.140:/gv0 gluster (on A node)
5.create some files (a,b,c) in dir gluster (on A node)
6.shutdown the B node
7.change the files (a,b,c) in dir gluster (on A node)
8.reboot B node
9.start glusterd on B node but glusterfsd is offline (on B node)
10. gluster volume remove-brick gv0 replica 1 128.224.162.255:/data/brick/gv0
force (on A node)
11. gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0
force (on A node)
Now the files are not same between two brick
12." gluster volume heal gv0 info " show entry num is 0 (on A node)
Now What I should do if I want to sync file(a,b,c) on two brick?
Currently, once you add a brick to a cluster, files won't sync automatically.
Patch has been sent to handle this requirement. Auto-heal will be available soon.

You could kill the newly added brick and perform the following operations from mount
for the sync to start :
1) create a directory <dirname>
2) setfattr -n "user.dirname" -v "value" <dirname>
3) delete the directory <dirname>

Once these steps are done, start the killed brick. self-heal-daemon will heal the files.

But, for the case you have mentioned, why are you removing brick and using add-brick again?
Is it because you don't want to change the brick-path?

You could use "replace-brick" command.
gluster v replace-brick <volname> <hostname:old-brick-path> <hostname:new-brick-path>
Note that source and destination should be different for this command to work.
I know the "heal full" can work , but I think the command take too long time.
So I run "tail -n 1 file" to all file on A node, but some files are sync but
some files are not.
1.Why the tail can't sync all files?
Did you run the tail command on mount point or from the backend (bricks)?
If you run from bricks, sync won't happen. Was client-side healing on?
To check if they were on or off, run `gluster v get volname all | grep self-heal`, cluster.metadata-self-heal, cluster.data-self-heal, cluster.entry-self-heal should be on.
2.Can the command "tail -n 1 filename" trigger selfheal, just like "ls -l
filename"?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
songxin
2016-03-02 10:39:01 UTC
Permalink
Thank you for your reply.I have two more questions as below


1. the command "gluster v replace-brick " is async or sync? The replace is complete when the command quit ?
2.I run "tail -n 0" on mount point.Does it trigger the heal?


Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, March 1, 2016 7:19:23 PM
Subject: [Gluster-users] about tail command
Hi,
A node:128.224.95.140
B node:128.224.162.255
brick on A node:/data/brick/gv0
brick on B node:/data/brick/gv0
1.gluster peer probe 128.224.162.255 (on A node)
2. gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0
128.224.162.255:/data/brick/gv0 force (on A node)
3.gluster volume start gv0 (on A node)
4. mount -t glusterfs 128.224.95.140:/gv0 gluster (on A node)
5.create some files (a,b,c) in dir gluster (on A node)
6.shutdown the B node
7.change the files (a,b,c) in dir gluster (on A node)
8.reboot B node
9.start glusterd on B node but glusterfsd is offline (on B node)
10. gluster volume remove-brick gv0 replica 1 128.224.162.255:/data/brick/gv0
force (on A node)
11. gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0
force (on A node)
Now the files are not same between two brick
12." gluster volume heal gv0 info " show entry num is 0 (on A node)
Now What I should do if I want to sync file(a,b,c) on two brick?
Currently, once you add a brick to a cluster, files won't sync automatically.
Patch has been sent to handle this requirement. Auto-heal will be available soon.
You could kill the newly added brick and perform the following operations from mount
1) create a directory <dirname>
2) setfattr -n "user.dirname" -v "value" <dirname>
3) delete the directory <dirname>
Once these steps are done, start the killed brick. self-heal-daemon will heal the files.
But, for the case you have mentioned, why are you removing brick and using add-brick again?
Is it because you don't want to change the brick-path?
You could use "replace-brick" command.
gluster v replace-brick <volname> <hostname:old-brick-path> <hostname:new-brick-path>
Note that source and destination should be different for this command to work.
I know the "heal full" can work , but I think the command take too long time.
So I run "tail -n 1 file" to all file on A node, but some files are sync but
some files are not.
1.Why the tail can't sync all files?
Did you run the tail command on mount point or from the backend (bricks)?
If you run from bricks, sync won't happen. Was client-side healing on?
To check if they were on or off, run `gluster v get volname all | grep self-heal`, cluster.metadata-self-heal, cluster.data-self-heal, cluster.entry-self-heal should be on.
2.Can the command "tail -n 1 filename" trigger selfheal, just like "ls -l
filename"?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
Alan Millar
2016-03-02 16:15:42 UTC
Permalink
Post by songxin
1. the command "gluster v replace-brick " is async or sync? The replace is complete when the command quit ?
Async. The command will end immediately, and the replace will continue in the background.


Use "gluster volume replace-brick VOLUME-NAME OLD-BRICK NEW-BRICK status" to monitor the progress.
songxin
2016-03-03 01:57:32 UTC
Permalink
Thank you very much for your reply.It is very helpful for me.
And I have one more question about "heal full" in glusterfs 3.7.6.


the reproduce steps :
A board:128.224.95.140
B board:128.224.162.255


1.gluster peer probe 128.224.162.255 (on A borad)
2.gluster volume create gv0 128.224.95.140:/data/brick/gv0 force (on A borad)
3.gluster volume start gv0 (on A borad)
4.mount -t glusterfs 128.224.95.140:/gv0 gluster (on A borad)
5.cteate some files(a,b and c) in dirctory gluster (on A board)


At this time point I want replicate the files created in directory gluster, so I add a empty brick on B board to volume.


6.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force (on A board)
7.gluster volume heal gv0 info (on A board)


Brick 128.224.162.255:/data/brick/gv0
Number of entries: 0


Brick 128.224.95.140:/data/brick/gv0
Number of entries: 0


The volume will not replicate the files (a,b and c), because these file is created when the volume type is distribute.
So I just use "heal full" to replicate these files.


8.gluster volume heal gv0 full (on A board)


My question is following.


The command "heal full" is async and "heal info" show nothing need to heal.
How could I know when the "heal full" has completed to replicate these files£ša,b and c£©?How to monitor the progress?


Thanks,
Xin
Post by Alan Millar
Post by songxin
1. the command "gluster v replace-brick " is async or sync? The replace is complete when the command quit ?
Async. The command will end immediately, and the replace will continue in the background.
Use "gluster volume replace-brick VOLUME-NAME OLD-BRICK NEW-BRICK status" to monitor the progress.
Alan Millar
2016-03-03 06:58:41 UTC
Permalink
Post by songxin
The command "heal full" is async and "heal info" show nothing need to heal.
How could I know when the "heal full" has completed to replicate these files(a,b and c)?How to monitor the progress?
I'm not a gluster expert; I'm pretty new to this. Yes, I've had the same problem and it is frustrating.

The other command that goes with "heal info" is "heal statistics". It is rather hard to read the output, but somewhere in the output you should see a line that says "heal is in progress". You have to keep rechecking the "heal statistics" output; eventually that line will go away meaning the heal is done.

What I do is:

gluster vol heal VOLUME-NAME statistics | grep -v -e ' 0$' -e ' time ' -e 'INDEX'


which gives an easier-to-read output when you just want to see the current status.


I also look at "du -s" on the source brick and the healing brick to get a sense of overall progress.
Anuradha Talur
2016-03-03 07:01:41 UTC
Permalink
----- Original Message -----
Sent: Wednesday, March 2, 2016 4:09:01 PM
Subject: Re:Re: [Gluster-users] about tail command
Thank you for your reply.I have two more questions as below
1. the command "gluster v replace-brick " is async or sync? The replace is
complete when the command quit ?
It is a sync command, replacing the brick finishes as the command returns.

In one of the earlier mails I gave incomplete command for replace brick, sorry about that.
The only replace-brick operation allowed from glusterfs 3.7.9 onwards is
'gluster v replace-brick <volname> <hostname:src_brick> <hostname:dst_brick> commit force'.
2.I run "tail -n 0" on mount point.Does it trigger the heal?
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, March 1, 2016 7:19:23 PM
Subject: [Gluster-users] about tail command
Hi,
A node:128.224.95.140
B node:128.224.162.255
brick on A node:/data/brick/gv0
brick on B node:/data/brick/gv0
1.gluster peer probe 128.224.162.255 (on A node)
2. gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0
128.224.162.255:/data/brick/gv0 force (on A node)
3.gluster volume start gv0 (on A node)
4. mount -t glusterfs 128.224.95.140:/gv0 gluster (on A node)
5.create some files (a,b,c) in dir gluster (on A node)
6.shutdown the B node
7.change the files (a,b,c) in dir gluster (on A node)
8.reboot B node
9.start glusterd on B node but glusterfsd is offline (on B node)
10. gluster volume remove-brick gv0 replica 1
128.224.162.255:/data/brick/gv0
force (on A node)
11. gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0
force (on A node)
Now the files are not same between two brick
12." gluster volume heal gv0 info " show entry num is 0 (on A node)
Now What I should do if I want to sync file(a,b,c) on two brick?
Currently, once you add a brick to a cluster, files won't sync automatically.
Patch has been sent to handle this requirement. Auto-heal will be available soon.
You could kill the newly added brick and perform the following operations from mount
1) create a directory <dirname>
2) setfattr -n "user.dirname" -v "value" <dirname>
3) delete the directory <dirname>
Once these steps are done, start the killed brick. self-heal-daemon will heal the files.
But, for the case you have mentioned, why are you removing brick and using
add-brick again?
Is it because you don't want to change the brick-path?
You could use "replace-brick" command.
gluster v replace-brick <volname> <hostname:old-brick-path>
<hostname:new-brick-path>
Note that source and destination should be different for this command to work.
I know the "heal full" can work , but I think the command take too long time.
So I run "tail -n 1 file" to all file on A node, but some files are sync but
some files are not.
1.Why the tail can't sync all files?
Did you run the tail command on mount point or from the backend (bricks)?
If you run from bricks, sync won't happen. Was client-side healing on?
To check if they were on or off, run `gluster v get volname all | grep
self-heal`, cluster.metadata-self-heal, cluster.data-self-heal,
cluster.entry-self-heal should be on.
2.Can the command "tail -n 1 filename" trigger selfheal, just like "ls -l
filename"?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
--
Thanks,
Anuradha.
Anuradha Talur
2016-03-03 07:14:54 UTC
Permalink
----- Original Message -----
Sent: Thursday, March 3, 2016 12:31:41 PM
Subject: Re: [Gluster-users] about tail command
----- Original Message -----
Sent: Wednesday, March 2, 2016 4:09:01 PM
Subject: Re:Re: [Gluster-users] about tail command
Thank you for your reply.I have two more questions as below
1. the command "gluster v replace-brick " is async or sync? The replace is
complete when the command quit ?
It is a sync command, replacing the brick finishes as the command returns.
In one of the earlier mails I gave incomplete command for replace brick, sorry about that.
The only replace-brick operation allowed from glusterfs 3.7.9 onwards is
'gluster v replace-brick <volname> <hostname:src_brick> <hostname:dst_brick> commit force'.
Sorry for spamming, but there is a typo here, I meant glusterfs 3.7.0 onwards,
not 3.7.9.
2.I run "tail -n 0" on mount point.Does it trigger the heal?
Thanks,
Xin
Post by Anuradha Talur
----- Original Message -----
Sent: Tuesday, March 1, 2016 7:19:23 PM
Subject: [Gluster-users] about tail command
Hi,
A node:128.224.95.140
B node:128.224.162.255
brick on A node:/data/brick/gv0
brick on B node:/data/brick/gv0
1.gluster peer probe 128.224.162.255 (on A node)
2. gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0
128.224.162.255:/data/brick/gv0 force (on A node)
3.gluster volume start gv0 (on A node)
4. mount -t glusterfs 128.224.95.140:/gv0 gluster (on A node)
5.create some files (a,b,c) in dir gluster (on A node)
6.shutdown the B node
7.change the files (a,b,c) in dir gluster (on A node)
8.reboot B node
9.start glusterd on B node but glusterfsd is offline (on B node)
10. gluster volume remove-brick gv0 replica 1
128.224.162.255:/data/brick/gv0
force (on A node)
11. gluster volume add-brick gv0 replica 2
128.224.162.255:/data/brick/gv0
force (on A node)
Now the files are not same between two brick
12." gluster volume heal gv0 info " show entry num is 0 (on A node)
Now What I should do if I want to sync file(a,b,c) on two brick?
Currently, once you add a brick to a cluster, files won't sync automatically.
Patch has been sent to handle this requirement. Auto-heal will be
available
soon.
You could kill the newly added brick and perform the following operations from mount
1) create a directory <dirname>
2) setfattr -n "user.dirname" -v "value" <dirname>
3) delete the directory <dirname>
Once these steps are done, start the killed brick. self-heal-daemon will
heal the files.
But, for the case you have mentioned, why are you removing brick and using
add-brick again?
Is it because you don't want to change the brick-path?
You could use "replace-brick" command.
gluster v replace-brick <volname> <hostname:old-brick-path>
<hostname:new-brick-path>
Note that source and destination should be different for this command to work.
I know the "heal full" can work , but I think the command take too long time.
So I run "tail -n 1 file" to all file on A node, but some files are sync but
some files are not.
1.Why the tail can't sync all files?
Did you run the tail command on mount point or from the backend (bricks)?
If you run from bricks, sync won't happen. Was client-side healing on?
To check if they were on or off, run `gluster v get volname all | grep
self-heal`, cluster.metadata-self-heal, cluster.data-self-heal,
cluster.entry-self-heal should be on.
2.Can the command "tail -n 1 filename" trigger selfheal, just like "ls -l
filename"?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
--
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Anuradha.
Alan Millar
2016-03-03 16:10:49 UTC
Permalink
Post by songxin
Post by songxin
1. the command "gluster v replace-brick " is async or sync? The
replace is
Post by songxin
complete when the command quit ?
It is a sync command, replacing the brick finishes as the command returns.
Hmm, that has not been my experience with 3.7.6 and 3.7.8. Perhaps there is a question of semantics here or definition of what the command precisely does.

What I found is that the command returns fairly quickly, in a few seconds. In that amount of time, the old brick is removed from the volume configuration and the new brick is added to the volume configuration. As soon as the replace-brick command is done, the "volume info" will show the new configuration with the old brick gone and the new brick included. So in the sense of volume configuration, it is complete.

But the data is not moved or healed at this point; that is only starting. The heal process will then proceed separately after the "replace-brick" command.


I certainly think of the overall brick replacement process as including the full replication of data to the new brick, even if the "replace-brick" command does not do that. I imagine other people might think the same way also. A new empty brick isn't protecting your replicated data, so it is an incomplete replacement.


Older documentation certainly refers to "replace brick start". I couldn't find any 3.7 documentation that explained why that was gone, and why "commit force" was the only option available now. I just got errors at the command line trying to do "start". I think it would help if the new documentation was a little clearer about this, and how to look at heal info to find out when your brick replacement is fully finished.

That's my opinion :-)


- Alan
songxin
2016-03-07 02:10:09 UTC
Permalink
Hi all,
I have a problem about how to recovery a replicate volume.


precondition:
glusterfs version:3.7.6
brick of A board :128.224.95.140:/data/brick/gv0
brick of B board:128.224.162.255:/data/brick/gv0


reproduce:
1.gluster peer probe 128.224.162.255 (on A board)
2.gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0 128.224.162.255:/data/brick/gv0 force (on A board)
3.gluster volume start gv0 (on A board)
4.reboot the B board


After B board reboot,sometimes I have problems as below.
1.the peer status some times is rejected when I run "gluster peer status". (on A or B board)
2.The brick on B board sometimes is offline When I run "gluster volume status" (on A or B board)


I want to know how I should do to recovery my replicate volume.


PS.
Now I do following operation to recovery my replicate volume.But sometimes I can't sync all the files in replicate volume even if I run "heal full".
1.gluster volume remove-brick gv0 replica 1 128.224.162.255:/data/brick/gv0 force (on A board)
2. gluster peer detach 128.224.162.255 (on A board)
3.gluster peer probe 128.224.162.255 (on A board)
4.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force (on A board)






Please help me.


Thanks,
Xin
Atin Mukherjee
2016-03-07 03:49:37 UTC
Permalink
Post by songxin
Hi all,
I have a problem about how to recovery a replicate volume.
glusterfs version:3.7.6
brick of A board :128.224.95.140:/data/brick/gv0
brick of B board:128.224.162.255:/data/brick/gv0
1.gluster peer probe 128.224.162.255
(on A board)
2.gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0
128.224.162.255:/data/brick/gv0 force
(on A board)
3.gluster volume start gv0
(on A board)
4.reboot the B board
After B board reboot,sometimes I have problems as below.
1.the peer status some times is rejected when I run "gluster peer
status".
This is where you get into the problem. I am really not sure what
happens when you reboot a board. In our earlier conversation w.r.t to a
similar problem you did mention that board reboot doesn't wipe of
/var/lib/glusterd, please double confirm!

Also please send cmd_history.log along with glusterd log from both the
nodes. Also post reboot are you also trying to detach/probe A? If so
before detaching was A & B were in cluster connected state?
Post by songxin
(on A or B board)
2.The brick on B board sometimes is offline When I run "gluster volume
status"
(on A or B board)
I want to know how I should do to recovery my replicate volume.
PS.
Now I do following operation to recovery my replicate volume.But
sometimes I can't sync all the files in replicate volume even if I run
"heal full".
1.gluster volume remove-brick gv0 replica 1
128.224.162.255:/data/brick/gv0 force
(on A board)
2. gluster peer detach 128.224.162.255
(on A board)
3.gluster peer probe 128.224.162.255
(on A board)
4.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0
force
(on A board)
Please help me.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-03-10 07:33:00 UTC
Permalink
Hi all,
I have a file has a problem of gfid-mismatch as below.


stat: cannot stat '/mnt/c//public_html/cello/ior_files/nameroot.ior': Input/output error
Remote:

getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000000256ded2f6000ad80f
trusted.gfid=0x771221a7bb3c4f1aade40ce9e38a95ee

Local:

getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.bit-rot.version=0x000000000000000256ded38f000e3a51
trusted.gfid=0x8ea33f46703c4e2d95c09153c1b858fd


There is a saying in link https://gluster.readthedocs.org/en/latest/Troubleshooting/split-brain/ as below.


This is done by observing the afr changelog extended attributes of the file on the bricks using the getfattr command; then identifying the type of split-brain (data split-brain, metadata split-brain, entry split-brain or split-brain due to gfid-mismatch); and finally determining which of the bricks contains the 'good copy' of the file.


So the gfid-mismatch is also a split-brain.
But I found that "gluster volume heal gv0 info split-brain" can't show split-brain entry due to gfid-mismatch.


My question is following:
1.Which command can be used to show split-brain due to gfid-mismatch?
2.How to heal it£¿Is it same as data split-brain£¿




Thanks£¬
Xin
Krutika Dhananjay
2016-03-10 07:47:21 UTC
Permalink
A gfid mismatch should also be showing up in the form of a split-brain on
the parent directory of the entry in question.

1) In your case, does 'public_html/cello/ior_files' show itself up in the
output of `gluster volume heal <VOL> info split-brain`?
2) And what version of gluster are you using?
3) Could you share the output of `gluster volume info`?
4) nameroot.ior is a regular file, correct? Could you confirm that?

-Krutika
Post by songxin
Hi all,
I have a file has a problem of gfid-mismatch as below.
stat: cannot stat '/mnt/c//public_html/cello/ior_files/nameroot.ior': Input/output error
getfattr -d -m . -e hex
opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000000256ded2f6000ad80f
trusted.gfid=0x771221a7bb3c4f1aade40ce9e38a95ee
getfattr -d -m . -e hex
opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.bit-rot.version=0x000000000000000256ded38f000e3a51
trusted.gfid=0x8ea33f46703c4e2d95c09153c1b858fd
There is a saying in link
https://gluster.readthedocs.org/en/latest/Troubleshooting/split-brain/ as
below.
This is done by observing the afr changelog extended attributes of the
file on the bricks using the getfattr command; then identifying the type of
split-brain (data split-brain, metadata split-brain, entry split-brain or *split-brain
due to gfid-mismatch*); and finally determining which of the bricks
contains the 'good copy' of the file.
So the gfid-mismatch is also a split-brain.
But I found that "gluster volume heal gv0 info split-brain" can't show
split-brain entry due to gfid-mismatch.
1.Which command can be used to show split-brain due to gfid-mismatch?
2.How to heal itIs it same as data split-brain
Thanks
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-03-10 08:29:36 UTC
Permalink
Hi,
Thank you for your reply.My answer is below.





1) In your case, does 'public_html/cello/ior_files' show itself up in the output of `gluster volume heal <VOL> info split-brain`?

Answer:
No.the heal in spit-brain show as below.
#gluster volume heal c_glusterfs info split-brain
Brick 10.32.1.144:/opt/lvmdir/c2/brick
Number of entries in split-brain: 0

Brick 10.32.0.48:/opt/lvmdir/c2/brick
Number of entries in split-brain: 0


2) And what version of gluster are you using?

Answer:
My glusterfs version is 3.7.6.


3) Could you share the output of `gluster volume info`?

Answer:
Volume Name: gv0
Type: Replicate
Volume ID: 04dad182-26f9-468e-b012-bf3c84f09910
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.32.1.144:/opt/lvmdir/c2/brick
Brick2: 10.32.0.48:/opt/lvmdir/c2/brick
Options Reconfigured:
storage.build-pgfid: on
nfs.disable: on
performance.readdir-ahead: on


4) nameroot.ior is a regular file, correct? Could you confirm that?
Answer:
Yes.it is regular file.






On Thu, Mar 10, 2016 at 1:03 PM, songxin <***@126.com> wrote:

Hi all,
I have a file has a problem of gfid-mismatch as below.


stat: cannot stat '/mnt/c//public_html/cello/ior_files/nameroot.ior': Input/output error
Remote:

getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000000256ded2f6000ad80f
trusted.gfid=0x771221a7bb3c4f1aade40ce9e38a95ee

Local:

getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.bit-rot.version=0x000000000000000256ded38f000e3a51
trusted.gfid=0x8ea33f46703c4e2d95c09153c1b858fd


There is a saying in link https://gluster.readthedocs.org/en/latest/Troubleshooting/split-brain/ as below.


This is done by observing the afr changelog extended attributes of the file on the bricks using the getfattr command; then identifying the type of split-brain (data split-brain, metadata split-brain, entry split-brain or split-brain due to gfid-mismatch); and finally determining which of the bricks contains the 'good copy' of the file.


So the gfid-mismatch is also a split-brain.
But I found that "gluster volume heal gv0 info split-brain" can't show split-brain entry due to gfid-mismatch.


My question is following:
1.Which command can be used to show split-brain due to gfid-mismatch?
2.How to heal it£¿Is it same as data split-brain£¿




Thanks£¬
Xin
songxin
2016-03-14 07:19:53 UTC
Permalink
Hi,
I have create a replicate volume and I want to run "gluster volume heal gv0 full".


I found that if I run "gluster volume heal gv0 full" on one board it always output err like below.
Launching heal operation to perform full self heal on volume gv0 has been unsuccessful


But If I run "heal full " on the another board it alway sucessful.


I found the code of glusterfs as below.


if (gf_uuid_compare (brickinfo->uuid, candidate) > 0)
gf_uuid_copy (candidate, brickinfo->uuid);


if ((*index) % hxl_children == 0) {
if (!gf_uuid_compare (MY_UUID, candidate)) {
_add_hxlator_to_dict (dict, volinfo,
((*index)-1)/hxl_children,
(*hxlator_count));
(*hxlator_count)++;
}
gf_uuid_clear (candidate);
}


My question is below:
Must I run "heal full" on the board whose uuid is the biggest?
If so, how cound I know which is the biggest board before I try to run "heal full" on every board?


Thanks,
Xin
Krutika Dhananjay
2016-03-15 03:43:56 UTC
Permalink
Yes. 'heal-full' should be executed on the node with the highest uuid.

Here's how i normally figure out what uuid is the highest:
Put all the nodes' uuids in a text file, one per line, sort them and get
the last uuid from the list.

To be more precise:
On any node, you can get the uuids of the peers through `gluster peer
status` output.
Gather them all and put them in a file.
Next, in order to get the uuid of the node where you executed peer status
itself, you use the command `gluster system:: uuid get`
Put this uuid as well into the text file.

Now execute
# cat <text-file-path> | sort

The last uuid printed in this list is the one that corresponds to the
highest uuid in the cluster.

HTH,
Krutika
Post by songxin
Hi,
I have create a replicate volume and I want to run "gluster volume heal gv0 full".
I found that if I run "gluster volume heal gv0 full" on one board it
always output err like below.
Launching heal operation to perform full self heal on volume gv0
has been unsuccessful
But If I run "heal full " on the another board it alway sucessful.
I found the code of glusterfs as below.
if (gf_uuid_compare (brickinfo->uuid, candidate) > 0)
gf_uuid_copy (candidate, brickinfo->uuid);
if ((*index) % hxl_children == 0) {
if (!gf_uuid_compare (MY_UUID, candidate)) {
_add_hxlator_to_dict (dict, volinfo,
((*index)-1)/hxl_children,
(*hxlator_count));
(*hxlator_count)++;
}
gf_uuid_clear (candidate);
}
Must I run "heal full" on the board whose uuid is the biggest?
If so, how cound I know which is the biggest board before I try to run
"heal full" on every board?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Joe Julian
2016-03-15 13:52:13 UTC
Permalink
What? That's poor behavior. I'll open a bug on that. We have a network of management daemons that should be able to coordinate that on their own.
Post by Krutika Dhananjay
Yes. 'heal-full' should be executed on the node with the highest uuid.
Put all the nodes' uuids in a text file, one per line, sort them and get
the last uuid from the list.
On any node, you can get the uuids of the peers through `gluster peer
status` output.
Gather them all and put them in a file.
Next, in order to get the uuid of the node where you executed peer status
itself, you use the command `gluster system:: uuid get`
Put this uuid as well into the text file.
Now execute
# cat <text-file-path> | sort
The last uuid printed in this list is the one that corresponds to the
highest uuid in the cluster.
HTH,
Krutika
Post by songxin
Hi,
I have create a replicate volume and I want to run "gluster volume
heal
Post by songxin
gv0 full".
I found that if I run "gluster volume heal gv0 full" on one board it
always output err like below.
Launching heal operation to perform full self heal on volume
gv0
Post by songxin
has been unsuccessful
But If I run "heal full " on the another board it alway sucessful.
I found the code of glusterfs as below.
if (gf_uuid_compare (brickinfo->uuid, candidate) > 0)
gf_uuid_copy (candidate, brickinfo->uuid);
if ((*index) % hxl_children == 0) {
if (!gf_uuid_compare (MY_UUID, candidate)) {
_add_hxlator_to_dict (dict, volinfo,
((*index)-1)/hxl_children,
(*hxlator_count));
Post by songxin
(*hxlator_count)++;
}
gf_uuid_clear (candidate);
}
Must I run "heal full" on the board whose uuid is the biggest?
If so, how cound I know which is the biggest board before I try to
run
Post by songxin
"heal full" on every board?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
------------------------------------------------------------------------
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Krutika Dhananjay
2016-03-15 14:16:39 UTC
Permalink
Joe,

There is an open bug already:
https://bugzilla.redhat.com/show_bug.cgi?id=1112158

-Krutika
Post by Joe Julian
What? That's poor behavior. I'll open a bug on that. We have a network of
management daemons that should be able to coordinate that on their own.
Post by Krutika Dhananjay
Yes. 'heal-full' should be executed on the node with the highest uuid.
Put all the nodes' uuids in a text file, one per line, sort them and get
the last uuid from the list.
On any node, you can get the uuids of the peers through `gluster peer
status` output.
Gather them all and put them in a file.
Next, in order to get the uuid of the node where you executed peer status
itself, you use the command `gluster system:: uuid get`
Put this uuid as well into the text file.
Now execute
# cat <text-file-path> | sort
The last uuid printed in this list is the one that corresponds to the
highest uuid in the cluster.
HTH,
Krutika
Post by songxin
Hi,
I have create a replicate volume and I want to run "gluster volume heal gv0 full".
I found that if I run "gluster volume heal gv0 full" on one board it
always output err like below.
Launching heal operation to perform full self heal on volume
gv0 has been unsuccessful
But If I run "heal full " on the another board it alway sucessful.
I found the code of glusterfs as below.
if (gf_uuid_compare (brickinfo->uuid, candidate) > 0)
gf_uuid_copy (candidate, brickinfo->uuid);
if ((*index) % hxl_children == 0) {
if (!gf_uuid_compare (MY_UUID, candidate)) {
_add_hxlator_to_dict (dict, volinfo,
((*index)-1)/hxl_children,
(*hxlator_count));
(*hxlator_count)++;
}
gf_uuid_clear (candidate);
}
Must I run "heal full" on the board whose uuid is the biggest?
If so, how cound I know which is the biggest board before I try to run
"heal full" on every board?
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
------------------------------
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
songxin
2016-03-18 06:34:54 UTC
Permalink
Hi,
Thank you for your help.
My glusterfs version is 3.7.6.
Must 'heal-full' be executed on the node with the highest uuid in glusterfs 3.7.6.


Does glusterfs 3.7.8 have fixed this problem?Or there is no patch for it?


Thanks,
Xin










At 2016-03-15 22:16:39, "Krutika Dhananjay" <***@redhat.com> wrote:

Joe,


There is an open bug already: https://bugzilla.redhat.com/show_bug.cgi?id=1112158


-Krutika



On Tue, Mar 15, 2016 at 7:22 PM, Joe Julian <***@julianfamily.org> wrote:

What? That's poor behavior. I'll open a bug on that. We have a network of management daemons that should be able to coordinate that on their own.



On March 14, 2016 8:43:56 PM PDT, Krutika Dhananjay <***@redhat.com> wrote:
Yes. 'heal-full' should be executed on the node with the highest uuid.


Here's how i normally figure out what uuid is the highest:

Put all the nodes' uuids in a text file, one per line, sort them and get the last uuid from the list.


To be more precise:

On any node, you can get the uuids of the peers through `gluster peer status` output.

Gather them all and put them in a file.

Next, in order to get the uuid of the node where you executed peer status itself, you use the command `gluster system:: uuid get`

Put this uuid as well into the text file.


Now execute

# cat <text-file-path> | sort


The last uuid printed in this list is the one that corresponds to the highest uuid in the cluster.


HTH,

Krutika





On Mon, Mar 14, 2016 at 12:49 PM, songxin <***@126.com> wrote:

Hi,
I have create a replicate volume and I want to run "gluster volume heal gv0 full".


I found that if I run "gluster volume heal gv0 full" on one board it always output err like below.
Launching heal operation to perform full self heal on volume gv0 has been unsuccessful


But If I run "heal full " on the another board it alway sucessful.


I found the code of glusterfs as below.


if (gf_uuid_compare (brickinfo->uuid, candidate) > 0)
gf_uuid_copy (candidate, brickinfo->uuid);


if ((*index) % hxl_children == 0) {
if (!gf_uuid_compare (MY_UUID, candidate)) {
_add_hxlator_to_dict (dict, volinfo,
((*index)-1)/hxl_children,
(*hxlator_count));
(*hxlator_count)++;
}
gf_uuid_clear (candidate);
}


My question is below:
Must I run "heal full" on the board whose uuid is the biggest?
If so, how cound I know which is the biggest board before I try to run "heal full" on every board?


Thanks,
Xin














_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users





Gluster-users mailing list
Gluster-***@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
songxin
2016-03-03 03:28:07 UTC
Permalink
Hi,


recondition:
glusterfs version is 3.7.6
A node:128.224.95.140
B node:128.224.162.255

brick on A node:/data/brick/gv0
brick on B node:/data/brick/gv0


reproduce steps:
1.gluster peer probe 128.224.162.255 (on A node)
2. gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0 128.224.162.255:/data/brick/gv0 force (on A node)
3.gluster volume start gv0 (on A node)
4. mount -t glusterfs 128.224.95.140:/gv0 gluster (on A node)
5.create file 11 in dir gluster (on A node)

6.getfattr -m. -d -e hex /data/brick/gv0/11 (on A node)


# file: data/brick/gv0/11
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x00000000000000025696d78700029573

trusted.gfid=0xe696148665c343f7ace19184f0b5e7fa
6.getfattr -m. -d -e hex /data/brick/gv0/11 (on B node)
# file: data/brick/gv0/11 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x000000000000000256653d270006d953 trusted.gfid=0xe696148665c343f7ace19184f0b5e7fa


My question is following.


Why the file a only has one trusted.afr.dirty extended attribute about change log in replicate volume?


I know right info by run "getfattr" is like below.


Example:
[***@store3 ~]# getfattr -d -e hex -m. brick-a/file.txt
#file: brick-a/file.txt
security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.afr.vol-client-3=0x000000000200000000000000
trusted.gfid=0x307a5c9efddd4e7c96e94fd4bcdcbd1b



replica pair, i.e.brick-b:
trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for itself (brick-a)
trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for brick-b as seen by brick-a

Likewise, all files in brick-b will have:
trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for brick-a as seen by brick-b
trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for itself (brick-b)




Above info is getting from link https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md.




Thanks,

Xin
Ravishankar N
2016-02-24 04:11:09 UTC
Permalink
Post by songxin
Hi all,
I have a question about replicate volume as below.
precondition
1.A node ip: 128.224.162.163
2.B node ip:128.224.162.255
3.A node brick:/data/brick/gv0
4.B node brick:/data/brick/gv0
1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force
//run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2
128.224.162.255:/data/brick/gv0 force //run on A node
7.create some file(d,e,f) in directory gluster //run on A node
8.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on B node
9.ls gluster //run on B node
My question is as below.
After step 6, the volume type is change from distribute to replicate.
The file (a,b,c) is created when the volume type is distribute.
The file (d,e,f) is created when the volume type is replicate.
After step 6, does the volume will replicate the file (a,b,c) in two
brick?Or it just replicate the file(d,e,f) in two brick?
If I run "gluster volume heal gv0 full", does the volume will
replicate the file (a,b,c) in two brick?
All files before adding the new brick must be replicated into the new
brick after heal full. Anuradha's patches to automatically heal files
as a part of add-brick command (without the need to manually run full
heal) will be in 3.8. Meanwhile, these steps should help in triggering
Post by songxin
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
songxin
2016-02-24 04:33:47 UTC
Permalink
Hi,
Thank you for answering my question. And I have another question to ask.
If there has been some file(c, d, e) in the B node brick before step 6 as below.And the file c is diffetent with file c created in A mount poin.


1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7. gluster volume heal gv0 info split-brain //run on A node
8.gluster volume heal gv0 info fulll //run on A node
After step 7, should some split-brain entries be present?
After step 8, should all files (a,b,c,d,e) will be replicate in volume gv0?
If so, how the file c be replicated when the file c is different between A brick and B brick?






ÔÚ 2016-02-24 12:11:09£¬"Ravishankar N" <***@redhat.com> ÐŽµÀ£º

On 02/24/2016 07:16 AM, songxin wrote:

Hi all,
I have a question about replicate volume as below.


precondition£º
1.A node ip: 128.224.162.163
2.B node ip:128.224.162.255
3.A node brick:/data/brick/gv0
4.B node brick:/data/brick/gv0


reproduce step:
1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7.create some file(d,e,f) in directory gluster //run on A node
8.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on B node
9.ls gluster //run on B node


My question is as below.


After step 6, the volume type is change from distribute to replicate.
The file (a,b,c) is created when the volume type is distribute.
The file (d,e,f) is created when the volume type is replicate.


After step 6, does the volume will replicate the file (a,b,c) in two brick?Or it just replicate the file(d,e,f) in two brick?
If I run "gluster volume heal gv0 full", does the volume will replicate the file (a,b,c) in two brick?


All files before adding the new brick must be replicated into the new brick after heal full. Anuradha's patches to automatically heal files as a part of add-brick command (without the need to manually run full heal) will be in 3.8. Meanwhile, these steps should help in triggering heals:

https://www.mail-archive.com/gluster-***@gluster.org/msg23293.html


Thanks,
Xin
Ravishankar N
2016-02-24 04:42:39 UTC
Permalink
Hello,
Post by songxin
Hi,
Thank you for answering my question. And I have another question to ask.
If there has been some file(c, d, e) in the B node brick before step 6
as below.And the file c is diffetent with file c created in A mount poin.
Any new brick that you add to a volume has to be empty. It must not
contain data or be a brick that was a part of another volume etc.
Hope that helps,
Ravi
Post by songxin
Post by songxin
1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force
//run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2
128.224.162.255:/data/brick/gv0 force //run on A node
7.gluster volume heal gv0 info split-brain//run on A node
8.gluster volume heal gv0 info fulll //run on A node
After step 7, should some split-brain entries be present?
After step 8, should all files (a,b,c,d,e) will be replicate in volume gv0?
If so, how the file c be replicated when the file c is different
between A brick and B brick?
songxin
2016-02-24 04:51:40 UTC
Permalink
Before step 6, there are some files(a,b,c), that are created at step 5 , in the A brick when volume type is distribute.
Should the files(a,b,c) in A brick be move to another directory before step 6?
And move back, after step 6, when the volume type is changed to replicate?


Thanks,
Xin






ÔÚ 2016-02-24 12:42:39£¬"Ravishankar N" <***@redhat.com> ÐŽµÀ£º

Hello,

On 02/24/2016 10:03 AM, songxin wrote:

Hi,
Thank you for answering my question. And I have another question to ask.
If there has been some file(c, d, e) in the B node brick before step 6 as below.And the file c is diffetent with file c created in A mount poin.


Any new brick that you add to a volume has to be empty. It must not contain data or be a brick that was a part of another volume etc.
Hope that helps,
Ravi



1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7. gluster volume heal gv0 info split-brain //run on A node
8.gluster volume heal gv0 info fulll //run on A node
After step 7, should some split-brain entries be present?
After step 8, should all files (a,b,c,d,e) will be replicate in volume gv0?
If so, how the file c be replicated when the file c is different between A brick and B brick?
Ravishankar N
2016-02-24 04:55:40 UTC
Permalink
Post by songxin
Before step 6, there are some files(a,b,c), that are created at step 5
, in the A brick when volume type is distribute.
Should the files(a,b,c) in A brick be move to another directory before step 6?
And move back, after step 6, when the volume type is changed to replicate?
Not required. Only the newly added brick must be empty.
Post by songxin
Thanks,
Xin
Hello,
Post by songxin
Hi,
Thank you for answering my question. And I have another question to ask.
If there has been some file(c, d, e) in the B node brick before
step 6 as below.And the file c is diffetent with file c created
in A mount poin.
Any new brick that you add to a volume has to be empty. It must
not contain data or be a brick that was a part of another volume etc.
Hope that helps,
Ravi
Post by songxin
Post by songxin
1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0
128.224.162.163:/data/brick/gv0 force
//run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2
128.224.162.255:/data/brick/gv0 force //run on A node
7.gluster volume heal gv0 info split-brain//run on A node
8.gluster volume heal gv0 info fulll //run on A node
After step 7, should some split-brain entries be present?
After step 8, should all files (a,b,c,d,e) will be replicate in volume gv0?
If so, how the file c be replicated when the file c is different
between A brick and B brick?
songxin
2016-02-24 05:24:36 UTC
Permalink
1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7. gluster volume heal gv0 info split-brain //run on A node
8.gluster volume heal gv0 info fulll


At step 7, should some split-brain entries be presented?




ÔÚ 2016-02-24 12:55:40£¬"Ravishankar N" <***@redhat.com> ÐŽµÀ£º

On 02/24/2016 10:21 AM, songxin wrote:





Before step 6, there are some files(a,b,c), that are created at step 5 , in the A brick when volume type is distribute.
Should the files(a,b,c) in A brick be move to another directory before step 6?
And move back, after step 6, when the volume type is changed to replicate?


Not required. Only the newly added brick must be empty.


Thanks,
Xin





ÔÚ 2016-02-24 12:42:39£¬"Ravishankar N" <***@redhat.com> ÐŽµÀ£º

Hello,

On 02/24/2016 10:03 AM, songxin wrote:

Hi,
Thank you for answering my question. And I have another question to ask.
If there has been some file(c, d, e) in the B node brick before step 6 as below.And the file c is diffetent with file c created in A mount poin.


Any new brick that you add to a volume has to be empty. It must not contain data or be a brick that was a part of another volume etc.
Hope that helps,
Ravi



1.gluster peer probe 128.224.162.255 //run on A node
2.gluster volume create gv0 128.224.162.163:/data/brick/gv0 force //run on A node
3.gluster volume start gv0 //run on A node
4.mount -t glusterfs 128.224.162.163:/gv0 gluster //run on A node
5.create some file(a,b,c) in directory gluster //run on A node
6.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 force //run on A node
7. gluster volume heal gv0 info split-brain //run on A node
8.gluster volume heal gv0 info fulll //run on A node
After step 7, should some split-brain entries be present?
After step 8, should all files (a,b,c,d,e) will be replicate in volume gv0?
If so, how the file c be replicated when the file c is different between A brick and B brick?
Joe Julian
2016-02-19 23:32:22 UTC
Permalink
Hi,
I create a replicate volume with 2 brick.And I frequently reboot my two nodes and frequently
run “peer detach” “peer detach” “add-brick” "remove-brick".
[snip]

Why? You don't need to disassemble and reassemble your cluster every
time you reboot a server. Why not just reboot?

Do make sure self-heal has completed first, though.
Continue reading on narkive:
Loading...