Discussion:
Need help making a decision choosing MS DFS or Gluster+SAMBA+CTDB
(too old to reply)
David
2015-08-09 15:56:50 UTC
Permalink
Hi,

I need some help in making this call choosing between the two.
I have no experience with MS DFS or with Windows server OS as a file server.

There are some developers that pushing the DFS direction, mostly because
the applications that will use it and access will be from Microsoft using
CIFS.

Now I know that most serious storage, NAS and SAN vendors work with Linux
or Unix based because of performance and flexibility, and I'm afraid that
DFS will just won't carry the expected load.

Does anyone has experience with it?
Can some tell what are the PROS and CONS of each that can help us to make a
call?

Many thanks,
David
Mathieu Chateau
2015-08-09 18:05:18 UTC
Permalink
Hello,
By DFS, you mean DFS-R.
Because DFS can also be used only as domain space (DFS-N). This allow to
publish share that hide real server name and so allow to move target
somewhere else as needed.

As I do quite a lot of DFS-R, here are the differences using DFS-R instead
of Gluster:

- Replication occurs between servers. Client only connect to one of ther
server (can be based on AD topology), and is not aware that it's DFS-R.
- Replication only transmit block changed in files, not the whole file
- Replication is tracked using an internal Jet database
- You have reporting tool to see differences & co between servers
- This is active/active. If a client can/connect to a server, it will
work there.
- Lock on files are not replicated. If same file is changed on 2 servers
at same time, replication will log that in event log and put file that lost
in lost&found folder (the more recent win)
- Not any issue browsing files/folder tree. Everything act like if it
was just a file server.
- You can use NTFS permission to fine grain access (Go further than unix
style from my point of view)
- Quota are working
- You can prevent file based on extension
- New version can deduplicate content (file server standard)
- Writes are not synchronous. Once file is written, it's replicated in
the background.


Main difference is that you can't strip inside share content over multiple
servers like if they were just one (the distributed feature of Gluster).
Things are evolving with Windows Server 2016, but not yet RTM.
You can also use shared storage in a more cluster way, with or without DFS
replication (to survive a server down).

We nearly always use both DFS-R and DFS-N, so we can migrate share to a
different server without changes on client side.

NAS & SAN vendors don't have choice. NetApp can't use Windows, else they
can't customize it deeper enough, and they would have to pay license to MS.
I always found that using a linux for CIFS is far away from feature you
have on Windows side, and issue it generate (like robocopy diff. not
working without /FFT flag).

Backup of NetApp or EMC CIFS is just not working if doing that through
share, you have to use NDMP, which is proprietary and generate others issue
to backup (NDMP license, need to write directly itself on tape...).

What will be your clients ? Windows box ? Linux ? Both ? If both, going to
same shares?






Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
I need some help in making this call choosing between the two.
I have no experience with MS DFS or with Windows server OS as a file server.
There are some developers that pushing the DFS direction, mostly because
the applications that will use it and access will be from Microsoft using
CIFS.
Now I know that most serious storage, NAS and SAN vendors work with Linux
or Unix based because of performance and flexibility, and I'm afraid that
DFS will just won't carry the expected load.
Does anyone has experience with it?
Can some tell what are the PROS and CONS of each that can help us to make
a call?
Many thanks,
David
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David
2015-08-09 19:18:17 UTC
Permalink
Hi,

Thank you very much for detailed answer.

Most of the clients are Windows based OS's, but Linux will come in the
future.
Now I know that Windows does a bad job with NFS, so this is one concern
that I have, but I also worried about performance and stability.
I used to work with NFS clustered environments, also with GPFS and CTDB
exporting both NFS and CIFS servicing 100s of users and render farms with
no issues.

Are you aware of any performance challenges/limitations of DFS-R, I mean
real world ones compared to Linux? I aware of MS official DFS docs.
Wonder if there is a comparison of same HW, one runs Gluster replica (ontop
of XFS/Ext4) compared to two nodes DFS-R. (checking concurrent CIFS
session, IOps, network utilization while sync etc..)

Thanks again,
David
Post by Mathieu Chateau
Hello,
By DFS, you mean DFS-R.
Because DFS can also be used only as domain space (DFS-N). This allow to
publish share that hide real server name and so allow to move target
somewhere else as needed.
As I do quite a lot of DFS-R, here are the differences using DFS-R instead
- Replication occurs between servers. Client only connect to one of
ther server (can be based on AD topology), and is not aware that it's DFS-R.
- Replication only transmit block changed in files, not the whole file
- Replication is tracked using an internal Jet database
- You have reporting tool to see differences & co between servers
- This is active/active. If a client can/connect to a server, it will
work there.
- Lock on files are not replicated. If same file is changed on 2
servers at same time, replication will log that in event log and put file
that lost in lost&found folder (the more recent win)
- Not any issue browsing files/folder tree. Everything act like if it
was just a file server.
- You can use NTFS permission to fine grain access (Go further than
unix style from my point of view)
- Quota are working
- You can prevent file based on extension
- New version can deduplicate content (file server standard)
- Writes are not synchronous. Once file is written, it's replicated in
the background.
Main difference is that you can't strip inside share content over multiple
servers like if they were just one (the distributed feature of Gluster).
Things are evolving with Windows Server 2016, but not yet RTM.
You can also use shared storage in a more cluster way, with or without DFS
replication (to survive a server down).
We nearly always use both DFS-R and DFS-N, so we can migrate share to a
different server without changes on client side.
NAS & SAN vendors don't have choice. NetApp can't use Windows, else they
can't customize it deeper enough, and they would have to pay license to MS.
I always found that using a linux for CIFS is far away from feature you
have on Windows side, and issue it generate (like robocopy diff. not
working without /FFT flag).
Backup of NetApp or EMC CIFS is just not working if doing that through
share, you have to use NDMP, which is proprietary and generate others issue
to backup (NDMP license, need to write directly itself on tape...).
What will be your clients ? Windows box ? Linux ? Both ? If both, going to
same shares?
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
I need some help in making this call choosing between the two.
I have no experience with MS DFS or with Windows server OS as a file server.
There are some developers that pushing the DFS direction, mostly because
the applications that will use it and access will be from Microsoft using
CIFS.
Now I know that most serious storage, NAS and SAN vendors work with Linux
or Unix based because of performance and flexibility, and I'm afraid that
DFS will just won't carry the expected load.
Does anyone has experience with it?
Can some tell what are the PROS and CONS of each that can help us to make
a call?
Many thanks,
David
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Mathieu Chateau
2015-08-09 20:24:49 UTC
Permalink
I do have DFS-R in production, that replaced sometimes netapp ones.
But no similar workload as my current GFS.

In active/active, the most common issue is file changed on both side (no
global lock)
Will users access same content from linux & windows ?


Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
Thank you very much for detailed answer.
Most of the clients are Windows based OS's, but Linux will come in the
future.
Now I know that Windows does a bad job with NFS, so this is one concern
that I have, but I also worried about performance and stability.
I used to work with NFS clustered environments, also with GPFS and CTDB
exporting both NFS and CIFS servicing 100s of users and render farms with
no issues.
Are you aware of any performance challenges/limitations of DFS-R, I mean
real world ones compared to Linux? I aware of MS official DFS docs.
Wonder if there is a comparison of same HW, one runs Gluster replica
(ontop of XFS/Ext4) compared to two nodes DFS-R. (checking concurrent CIFS
session, IOps, network utilization while sync etc..)
Thanks again,
David
Post by Mathieu Chateau
Hello,
By DFS, you mean DFS-R.
Because DFS can also be used only as domain space (DFS-N). This allow to
publish share that hide real server name and so allow to move target
somewhere else as needed.
As I do quite a lot of DFS-R, here are the differences using DFS-R
- Replication occurs between servers. Client only connect to one of
ther server (can be based on AD topology), and is not aware that it's DFS-R.
- Replication only transmit block changed in files, not the whole file
- Replication is tracked using an internal Jet database
- You have reporting tool to see differences & co between servers
- This is active/active. If a client can/connect to a server, it will
work there.
- Lock on files are not replicated. If same file is changed on 2
servers at same time, replication will log that in event log and put file
that lost in lost&found folder (the more recent win)
- Not any issue browsing files/folder tree. Everything act like if it
was just a file server.
- You can use NTFS permission to fine grain access (Go further than
unix style from my point of view)
- Quota are working
- You can prevent file based on extension
- New version can deduplicate content (file server standard)
- Writes are not synchronous. Once file is written, it's replicated
in the background.
Main difference is that you can't strip inside share content over
multiple servers like if they were just one (the distributed feature of
Gluster). Things are evolving with Windows Server 2016, but not yet RTM.
You can also use shared storage in a more cluster way, with or without
DFS replication (to survive a server down).
We nearly always use both DFS-R and DFS-N, so we can migrate share to a
different server without changes on client side.
NAS & SAN vendors don't have choice. NetApp can't use Windows, else they
can't customize it deeper enough, and they would have to pay license to MS.
I always found that using a linux for CIFS is far away from feature you
have on Windows side, and issue it generate (like robocopy diff. not
working without /FFT flag).
Backup of NetApp or EMC CIFS is just not working if doing that through
share, you have to use NDMP, which is proprietary and generate others issue
to backup (NDMP license, need to write directly itself on tape...).
What will be your clients ? Windows box ? Linux ? Both ? If both, going
to same shares?
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
I need some help in making this call choosing between the two.
I have no experience with MS DFS or with Windows server OS as a file server.
There are some developers that pushing the DFS direction, mostly because
the applications that will use it and access will be from Microsoft using
CIFS.
Now I know that most serious storage, NAS and SAN vendors work with
Linux or Unix based because of performance and flexibility, and I'm afraid
that DFS will just won't carry the expected load.
Does anyone has experience with it?
Can some tell what are the PROS and CONS of each that can help us to
make a call?
Many thanks,
David
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
David
2015-08-10 05:11:03 UTC
Permalink
No, but files can be accessed from different clients from different nodes.

OK, so from what you are saying, there is no stability or other issues with
DFS keeping an eye on workload, and as long as files accessed from one
node, right?
Post by Mathieu Chateau
I do have DFS-R in production, that replaced sometimes netapp ones.
But no similar workload as my current GFS.
In active/active, the most common issue is file changed on both side (no
global lock)
Will users access same content from linux & windows ?
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
Thank you very much for detailed answer.
Most of the clients are Windows based OS's, but Linux will come in the
future.
Now I know that Windows does a bad job with NFS, so this is one concern
that I have, but I also worried about performance and stability.
I used to work with NFS clustered environments, also with GPFS and CTDB
exporting both NFS and CIFS servicing 100s of users and render farms with
no issues.
Are you aware of any performance challenges/limitations of DFS-R, I mean
real world ones compared to Linux? I aware of MS official DFS docs.
Wonder if there is a comparison of same HW, one runs Gluster replica
(ontop of XFS/Ext4) compared to two nodes DFS-R. (checking concurrent CIFS
session, IOps, network utilization while sync etc..)
Thanks again,
David
Post by Mathieu Chateau
Hello,
By DFS, you mean DFS-R.
Because DFS can also be used only as domain space (DFS-N). This allow to
publish share that hide real server name and so allow to move target
somewhere else as needed.
As I do quite a lot of DFS-R, here are the differences using DFS-R
- Replication occurs between servers. Client only connect to one of
ther server (can be based on AD topology), and is not aware that it's DFS-R.
- Replication only transmit block changed in files, not the whole file
- Replication is tracked using an internal Jet database
- You have reporting tool to see differences & co between servers
- This is active/active. If a client can/connect to a server, it
will work there.
- Lock on files are not replicated. If same file is changed on 2
servers at same time, replication will log that in event log and put file
that lost in lost&found folder (the more recent win)
- Not any issue browsing files/folder tree. Everything act like if
it was just a file server.
- You can use NTFS permission to fine grain access (Go further than
unix style from my point of view)
- Quota are working
- You can prevent file based on extension
- New version can deduplicate content (file server standard)
- Writes are not synchronous. Once file is written, it's replicated
in the background.
Main difference is that you can't strip inside share content over
multiple servers like if they were just one (the distributed feature of
Gluster). Things are evolving with Windows Server 2016, but not yet RTM.
You can also use shared storage in a more cluster way, with or without
DFS replication (to survive a server down).
We nearly always use both DFS-R and DFS-N, so we can migrate share to a
different server without changes on client side.
NAS & SAN vendors don't have choice. NetApp can't use Windows, else they
can't customize it deeper enough, and they would have to pay license to MS.
I always found that using a linux for CIFS is far away from feature you
have on Windows side, and issue it generate (like robocopy diff. not
working without /FFT flag).
Backup of NetApp or EMC CIFS is just not working if doing that through
share, you have to use NDMP, which is proprietary and generate others issue
to backup (NDMP license, need to write directly itself on tape...).
What will be your clients ? Windows box ? Linux ? Both ? If both, going
to same shares?
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
I need some help in making this call choosing between the two.
I have no experience with MS DFS or with Windows server OS as a file server.
There are some developers that pushing the DFS direction, mostly
because the applications that will use it and access will be from Microsoft
using CIFS.
Now I know that most serious storage, NAS and SAN vendors work with
Linux or Unix based because of performance and flexibility, and I'm afraid
that DFS will just won't carry the expected load.
Does anyone has experience with it?
Can some tell what are the PROS and CONS of each that can help us to
make a call?
Many thanks,
David
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Mathieu Chateau
2015-08-10 05:23:32 UTC
Permalink
Hello,

Yes it's much like a standard Windows File servers. You will only monitor
DFS-R replication through nagios or so, to check backlog/latency in
replication.

For performance, it's all about what you do with a standard Windows file
server anyway:

- Check raid controller settings
- NTFS formatted in 64K, not the 4K default,
- Defrag since the beginning through the Windows one (scheduled). O&O
can be a great invest for that if 1 volume go much beyond than 2M files
- Setup antivirus to only do realtime check on file change, not on
access. Exclude DFS database from antivirus scan
- Tune network card (send and receive buffer)
- ...

Start with 2012 R2, to get SMB v3 and all latest stuff

Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
No, but files can be accessed from different clients from different nodes.
OK, so from what you are saying, there is no stability or other issues
with DFS keeping an eye on workload, and as long as files accessed from one
node, right?
Post by Mathieu Chateau
I do have DFS-R in production, that replaced sometimes netapp ones.
But no similar workload as my current GFS.
In active/active, the most common issue is file changed on both side (no
global lock)
Will users access same content from linux & windows ?
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
Thank you very much for detailed answer.
Most of the clients are Windows based OS's, but Linux will come in the
future.
Now I know that Windows does a bad job with NFS, so this is one concern
that I have, but I also worried about performance and stability.
I used to work with NFS clustered environments, also with GPFS and CTDB
exporting both NFS and CIFS servicing 100s of users and render farms with
no issues.
Are you aware of any performance challenges/limitations of DFS-R, I mean
real world ones compared to Linux? I aware of MS official DFS docs.
Wonder if there is a comparison of same HW, one runs Gluster replica
(ontop of XFS/Ext4) compared to two nodes DFS-R. (checking concurrent CIFS
session, IOps, network utilization while sync etc..)
Thanks again,
David
Post by Mathieu Chateau
Hello,
By DFS, you mean DFS-R.
Because DFS can also be used only as domain space (DFS-N). This allow
to publish share that hide real server name and so allow to move target
somewhere else as needed.
As I do quite a lot of DFS-R, here are the differences using DFS-R
- Replication occurs between servers. Client only connect to one of
ther server (can be based on AD topology), and is not aware that it's DFS-R.
- Replication only transmit block changed in files, not the whole file
- Replication is tracked using an internal Jet database
- You have reporting tool to see differences & co between servers
- This is active/active. If a client can/connect to a server, it
will work there.
- Lock on files are not replicated. If same file is changed on 2
servers at same time, replication will log that in event log and put file
that lost in lost&found folder (the more recent win)
- Not any issue browsing files/folder tree. Everything act like if
it was just a file server.
- You can use NTFS permission to fine grain access (Go further than
unix style from my point of view)
- Quota are working
- You can prevent file based on extension
- New version can deduplicate content (file server standard)
- Writes are not synchronous. Once file is written, it's replicated
in the background.
Main difference is that you can't strip inside share content over
multiple servers like if they were just one (the distributed feature of
Gluster). Things are evolving with Windows Server 2016, but not yet RTM.
You can also use shared storage in a more cluster way, with or without
DFS replication (to survive a server down).
We nearly always use both DFS-R and DFS-N, so we can migrate share to a
different server without changes on client side.
NAS & SAN vendors don't have choice. NetApp can't use Windows, else
they can't customize it deeper enough, and they would have to pay license
to MS.
I always found that using a linux for CIFS is far away from feature you
have on Windows side, and issue it generate (like robocopy diff. not
working without /FFT flag).
Backup of NetApp or EMC CIFS is just not working if doing that through
share, you have to use NDMP, which is proprietary and generate others issue
to backup (NDMP license, need to write directly itself on tape...).
What will be your clients ? Windows box ? Linux ? Both ? If both, going
to same shares?
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by David
Hi,
I need some help in making this call choosing between the two.
I have no experience with MS DFS or with Windows server OS as a file server.
There are some developers that pushing the DFS direction, mostly
because the applications that will use it and access will be from Microsoft
using CIFS.
Now I know that most serious storage, NAS and SAN vendors work with
Linux or Unix based because of performance and flexibility, and I'm afraid
that DFS will just won't carry the expected load.
Does anyone has experience with it?
Can some tell what are the PROS and CONS of each that can help us to
make a call?
Many thanks,
David
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Ira Cooper
2015-08-10 05:26:01 UTC
Permalink
This post might be inappropriate. Click to display it.
Mathieu Chateau
2015-08-10 05:42:02 UTC
Permalink
Hello,

what do you mean by "true" clustering ?
We can do a Windows Failover cluster (1 virtual ip, 1 virtual name), but
this mean using a shared storage like SAN.

Then it depends on your network topology. If you have multiple geographical
sites / datacenter, then DFS-R behave a lot better than Gluster in
replicated mode. Users won't notice any latency,
At the price that replication is async.


Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by Ira Cooper
Post by Mathieu Chateau
I do have DFS-R in production, that replaced sometimes netapp ones.
But no similar workload as my current GFS.
In active/active, the most common issue is file changed on both side (no
global lock)
Will users access same content from linux & windows ?
If you want to go active/active. I'd recommend Samba + CTDB + Gluster.
You want true clustering, and a system that can handle the locking etc.
I'd layer normal DFS to do "namespace" control, and to help with
handling failover, or just use round robin DNS.
Thanks,
-Ira
Dan Mons
2015-08-10 07:07:52 UTC
Permalink
If you're looking at a Gluster+Samba setup of any description for
people extensively using Microsoft Office tools (either Windows or Mac
clients), I *strongly* suggested exhaustive testing of Microsoft Word
and Excel.

I've yet to find a way to make these work 100% on Gluster. Strange
client-side locking behaviour with these tools often make documents
completely unusable when hosted off Gluster. We host our large
production files (VFX industry) off Gluster, however have a separate
Windows Server VM purely for administration to host their legacy
Microsoft Office documents (we've since migrated largely to Google
Apps + Google Drive for that stuff, but the legacy requirement remains
for a handful of users).

-Dan

----------------
Dan Mons - R&D Sysadmin
Cutting Edge
http://cuttingedge.com.au
Post by Mathieu Chateau
Hello,
what do you mean by "true" clustering ?
We can do a Windows Failover cluster (1 virtual ip, 1 virtual name), but
this mean using a shared storage like SAN.
Then it depends on your network topology. If you have multiple geographical
sites / datacenter, then DFS-R behave a lot better than Gluster in
replicated mode. Users won't notice any latency,
At the price that replication is async.
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by Ira Cooper
Post by Mathieu Chateau
I do have DFS-R in production, that replaced sometimes netapp ones.
But no similar workload as my current GFS.
In active/active, the most common issue is file changed on both side (no
global lock)
Will users access same content from linux & windows ?
If you want to go active/active. I'd recommend Samba + CTDB + Gluster.
You want true clustering, and a system that can handle the locking etc.
I'd layer normal DFS to do "namespace" control, and to help with
handling failover, or just use round robin DNS.
Thanks,
-Ira
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Daniel Müller
2015-08-10 07:34:58 UTC
Permalink
This post might be inappropriate. Click to display it.
David
2015-08-10 08:04:30 UTC
Permalink
Thanks everyone.

So from reading all your comments, I understand that if I need an active /
active synchronized setup for higher workloads, Gluster is for me.
Other then that, DFS-R is a good option for data replication at the expanse
of latency of the replicated data to the secondary node, and only one
server is active per CIFS share.

Does DFS-R works well on high rate of changes?
Found from other users use cases that DFS-R caused server hangs and such,
hope it was fixed in Win2K12 server.

David
Post by Daniel Müller
You can choose to work with vfs objects= glusterfs
Glusterfs:volume=yourvolume
Glusterfs:volfile.server=Your.server
Form e it turned out to be too buggy.
I just used instead the path=/path/toyour/mountedgluster
posix locking =NO
kernel share modes = No
[edv]
comment=edv s4master verzeichnis auf gluster node1
vfs objects= recycle
##vfs objects= recycle, glusterfs
recycle:repository= /%P/Papierkorb
##glusterfs:volume= sambacluster
##glusterfs:volfile_server = XXX.XXXX.XXXX
recycle:exclude = *.tmp,*.temp,*.log,*.ldb,*.TMP,?~$*,~$*,Thumbs.db
recycle:keeptree = Yes
recycle:exclude_dir = .Papierkorb,Papierkorb,tmp,temp,profile,.profile
recycle:touch_mtime = yes
recycle:versions = Yes
recycle:minsize = 1
msdfs root=yes
path=/mnt/glusterfs/ads/wingroup/edv
read only=no
posix locking =NO
kernel share modes = No
access based share enum=yes
hide unreadable=yes
hide unwriteable files=yes
veto files = Thumbs.db
delete veto files = yes
Greetings
Daniel
EDV Daniel MÃŒller
Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 TÃŒbingen
Tel.: 07071/206-463, Fax: 07071/206-499
Internet: www.tropenklinik.de
-----UrsprÃŒngliche Nachricht-----
Gesendet: Montag, 10. August 2015 09:08
An: Mathieu Chateau
Cc: gluster-users; David
Betreff: Re: [Gluster-users] Need help making a decision choosing MS DFS or
Gluster+SAMBA+CTDB
If you're looking at a Gluster+Samba setup of any description for people
extensively using Microsoft Office tools (either Windows or Mac clients), I
*strongly* suggested exhaustive testing of Microsoft Word and Excel.
I've yet to find a way to make these work 100% on Gluster. Strange
client-side locking behaviour with these tools often make documents
completely unusable when hosted off Gluster. We host our large
production files (VFX industry) off Gluster, however have a separate Windows
Server VM purely for administration to host their legacy Microsoft Office
documents (we've since migrated largely to Google Apps + Google Drive for
that stuff, but the legacy requirement remains for a handful of users).
-Dan
----------------
Dan Mons - R&D Sysadmin
Cutting Edge
http://cuttingedge.com.au
Post by Mathieu Chateau
Hello,
what do you mean by "true" clustering ?
We can do a Windows Failover cluster (1 virtual ip, 1 virtual name),
but this mean using a shared storage like SAN.
Then it depends on your network topology. If you have multiple
geographical sites / datacenter, then DFS-R behave a lot better than
Gluster in replicated mode. Users won't notice any latency, At the
price that replication is async.
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by Ira Cooper
Post by Mathieu Chateau
I do have DFS-R in production, that replaced sometimes netapp ones.
But no similar workload as my current GFS.
In active/active, the most common issue is file changed on both
side (no global lock) Will users access same content from linux &
windows ?
If you want to go active/active. I'd recommend Samba + CTDB + Gluster.
You want true clustering, and a system that can handle the locking etc.
I'd layer normal DFS to do "namespace" control, and to help with
handling failover, or just use round robin DNS.
Thanks,
-Ira
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Ben Turner
2015-08-10 17:02:24 UTC
Permalink
----- Original Message -----
Sent: Monday, August 10, 2015 4:04:30 AM
Subject: Re: [Gluster-users] Need help making a decision choosing MS DFS or Gluster+SAMBA+CTDB
Thanks everyone.
So from reading all your comments, I understand that if I need an active /
active synchronized setup for higher workloads, Gluster is for me.
Other then that, DFS-R is a good option for data replication at the expanse
of latency of the replicated data to the secondary node, and only one server
is active per CIFS share.
If smallfile performance is a concern I HIGHLY recommend you steer clear of GLUSTER + SMB + CTDB. Large file sequential and random IO is not great but OK, but smallfile and metadata operations(especially from
Windows clients) are poor. To put it in perspective I can create 3500 64k files / second on my glusterFS mount, on SMB I can only do 308(same HW / config). This is something I am working on improving but there is quite a bit to be done on the gluster side for smallfile workloads to make sense performance wise:

- on creates, the extra xattrs that SMB requires (ACLs, etc) cause extra round trips, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/composite-operations#CREATE-AND-WRITE

- lack of good, coherent client-side caching (cache invalidation enables longer caching of metadata)
- incomplete metadata reads (READDIRPLUS) cause per-file round trips for directory scans, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/composite-operations#READDIRPLUS_used_to_prefetch_xattrs

- case-insensitive file lookup semantics, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/composite-operations#case-insensitive_volume_support

- high latency of file creates and even reads at the brick level, due to excessive system calls, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/stat-xattr-cache

-b
Does DFS-R works well on high rate of changes?
Found from other users use cases that DFS-R caused server hangs and such,
hope it was fixed in Win2K12 server.
David
You can choose to work with vfs objects= glusterfs
Glusterfs:volume=yourvolume
Glusterfs:volfile.server=Your.server
Form e it turned out to be too buggy.
I just used instead the path=/path/toyour/mountedgluster
posix locking =NO
kernel share modes = No
[edv]
comment=edv s4master verzeichnis auf gluster node1
vfs objects= recycle
##vfs objects= recycle, glusterfs
recycle:repository= /%P/Papierkorb
##glusterfs:volume= sambacluster
##glusterfs:volfile_server = XXX.XXXX.XXXX
recycle:exclude = *.tmp,*.temp,*.log,*.ldb,*.TMP,?~$*,~$*,Thumbs.db
recycle:keeptree = Yes
recycle:exclude_dir = .Papierkorb,Papierkorb,tmp,temp,profile,.profile
recycle:touch_mtime = yes
recycle:versions = Yes
recycle:minsize = 1
msdfs root=yes
path=/mnt/glusterfs/ads/wingroup/edv
read only=no
posix locking =NO
kernel share modes = No
access based share enum=yes
hide unreadable=yes
hide unwriteable files=yes
veto files = Thumbs.db
delete veto files = yes
Greetings
Daniel
EDV Daniel Müller
Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen
Tel.: 07071/206-463, Fax: 07071/206-499
Internet: www.tropenklinik.de
-----Ursprüngliche Nachricht-----
Gesendet: Montag, 10. August 2015 09:08
An: Mathieu Chateau
Cc: gluster-users; David
Betreff: Re: [Gluster-users] Need help making a decision choosing MS DFS or
Gluster+SAMBA+CTDB
If you're looking at a Gluster+Samba setup of any description for people
extensively using Microsoft Office tools (either Windows or Mac clients), I
*strongly* suggested exhaustive testing of Microsoft Word and Excel.
I've yet to find a way to make these work 100% on Gluster. Strange
client-side locking behaviour with these tools often make documents
completely unusable when hosted off Gluster. We host our large
production files (VFX industry) off Gluster, however have a separate Windows
Server VM purely for administration to host their legacy Microsoft Office
documents (we've since migrated largely to Google Apps + Google Drive for
that stuff, but the legacy requirement remains for a handful of users).
-Dan
----------------
Dan Mons - R&D Sysadmin
Cutting Edge
http://cuttingedge.com.au
Post by Mathieu Chateau
Hello,
what do you mean by "true" clustering ?
We can do a Windows Failover cluster (1 virtual ip, 1 virtual name),
but this mean using a shared storage like SAN.
Then it depends on your network topology. If you have multiple
geographical sites / datacenter, then DFS-R behave a lot better than
Gluster in replicated mode. Users won't notice any latency, At the
price that replication is async.
Cordialement,
Mathieu CHATEAU
http://www.lotp.fr
Post by Mathieu Chateau
I do have DFS-R in production, that replaced sometimes netapp ones.
But no similar workload as my current GFS.
In active/active, the most common issue is file changed on both
side (no global lock) Will users access same content from linux &
windows ?
If you want to go active/active. I'd recommend Samba + CTDB + Gluster.
You want true clustering, and a system that can handle the locking etc.
I'd layer normal DFS to do "namespace" control, and to help with
handling failover, or just use round robin DNS.
Thanks,
-Ira
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://www.gluster.org/mailman/listinfo/gluster-users
Ira Cooper
2015-08-10 07:29:53 UTC
Permalink
Post by Mathieu Chateau
Hello,
what do you mean by "true" clustering ?
We can do a Windows Failover cluster (1 virtual ip, 1 virtual name), but
this mean using a shared storage like SAN.
Then it depends on your network topology. If you have multiple geographical
sites / datacenter, then DFS-R behave a lot better than Gluster in
replicated mode. Users won't notice any latency,
At the price that replication is async.
I assumed a Gluster context. In order to cluster on gluster, you'll
need CTDB to keep the locking between the nodes consistent so you don't
run into the issues you mentioned with DFS-R from your mail. (I don't
use DFS-R, and I haven't.)

If you have multi-site replication... you probably want it async, unless
you have really good links or low throughput requirements :).

Thanks,

-Ira
Loading...