Gluster Resolving a Split Brain in a Replicated Setup

Initially this took about ~7hours to diagnose and fix, with what I have learned about the inner workings of gluster and the tools I am providing opensource this should cut resolution time down to ~5minutes.

Firs you must meet the following conditions:

  1. You are running gluster >= 3.0 <= 3.2 (May also work on 2.x I have not tested, and will not work with future versions if gluster change their use of xattrs)
  2. You are running a replicated volume (Again I have not tested distributed volumes, in theory remove, re-add and rebalance will fix these)
  3. You have a “good” copy of you data (This is essential this assume you have at least 1 brick with a good copy of the file system

Restrain and restore the “bad” brick

  1. Shutdown all services that are using the mounted filesystem (i.e. httpd / nginx / *ftpd)
  2. Unmount all the file systems on the node (glusterfs / nfs / etc …)
  3. Grab a copy of stripxattr.py make sure you READ the README for installation requirements and usage
  4. Run stripxattr.py against the backing filesystem on the “bad” node ONLY NOT AGAINST A GLUSTER MOUNT
  5. From the “good” node, not rsync the data: rsync -gioprtv –progress /path/to/filesystem root@:/path/to
  6. From the “good” node, trigger an ”auto heal” this will re-populate the xattr data (this must be done on a glusterfs mount not nfs/cifs/etc…)
  7. Download listxattr.py once the self heal has completed see the README file for a “quick and dirty” consistency check
  8. All being well you have now resolved a split-brain and can return your node to service

Current known gluster issues

  1. NFS is much (48x in tests) faster for small files i.e. php webapps, but does not support distributed locking meaning: all nodes can write to the same file at the same time, this is what cause our original split brain

So what is the resolution int his case?

Selective use, use glusterfs for filesystems that you need distributed locking, often in large production deploys php files will not change often, in this case NFS is perfect.

If you are still writing php sessions to a file system then STOP IT and use a database! (Better yet use memcache).

Comments