- To: miloska <miloska@xxxxxxxxx>
- Subject: Re: [SLUG] new cluster install
- From: Jake Anderson <yahoo@xxxxxxxxxxxxxxx>
- Date: Fri, 03 Jun 2011 16:49:02 +1000
- Cc: slug@xxxxxxxxxxx
- User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:184.108.40.206) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10
On 06/03/2011 04:15 PM, miloska wrote:
That was my initial reaction, but drbd is there to provide redundancy
and shared storage over the network, at least in theory ;->
I spose i'm wondering how much i can depend on that in real life. I've
made it through a number of disk failures and corruptions with mdadm so
I'm happy with that but drbd is new to me.
How should I setup my disks?
I would not run server without some disk redundancy (RAID1,5,6).
I'm fairly confident with networking so the complexity isn't too much of
an issue to me.
each server has 2 gbit ports on it.
team the nics, then run a vlan for drbd and another for regular coms
dedicated nic for drbd and one for general coms.
How many switches do you have? If you got 2 it may worth to go with
the teaming - to survive a switch failure - otherwise I'd keep it
simple, ie no teaming and vlan.
I spose i'm wondering about the performance benefits vs issues.
The teamed setup with one switch gives me some fault tolerance (bad
nic,port,cable), if you go to 2 switches you need to worry about things
like STP and the like and it all becomes too much ;->
I think I've decided to do the team because its fairly easy and will
allow me to expand if needed.
the performance is going to be a wash.
Any other suggestions or gotchas to look out for?
Test as much as you can. Remove cables, power, disk, cpu :) whatever
you can and learn what happens.
you can remove your cpu's ;-P
I'll settle for pulling plugs ;->
My personal advice not to overcomplicate things, prefer a working but
(half)manual failover solution over a fully automated but not working
one :) I think to do some really fancy things you should have shared
storage - as you don't don't push it too hard. You will not achieve
101% uptime with this setup, deal with it.
That's a given, hence the HA(ish).
Basically if something dies I'd like to get the office back up in a few
minutes rather than a few hours.
I was originally going to use libvirt and manually move/restart
instances because i already use libvirt just not with shared storage.
but then one of the nice people in #slug suggested ganeti to me and it
seems to be easy enough to use and handles setting drbd up for me and
monitoring / starting instances in the event of a node failure. Does
live migration etc as well.
If i add a 3rd node I can actually remove the first node and replace it
with the 3rd all whilst keeping everything live, or expand the cluster
into 3 nodes and maintain a raid1 style drbd setup for each VM.
It really does seem nifty.
Use puppet or cfengine (or other system automation tool) from the
beginning to make sure you can rebuild the whole system within minutes
(this helps for testing as well).
That's a good idea, I'll look into them for this.
I'll need it ;->