Software raid with linux.

Recently, i noticed my once massive 500 GB hard drive had been reduced to nothing more than 3-4 gigs of free space. I decided i wanted more. I also decided, i didn’t like the idea of one hard drive dying and taking every byte of data i’ve collected with it. This meant redundancy. Now, there are a lot of solutions out there for making storage redundant. But, I decided I had a price range of only $200-$300 total. There was now only one solution. Linux software raid.

For those who don’t know, there is information about it on wikipedia. Seriously, if you don’t know what raid is, you won’t be interested in this post. Those who do know may be wondering why i went software instead of hardware. Hardware raid is a dedicated hardware device (pci card) that implements raid on board and does all the xor’ing, etc and gives the device to the system as a single drive. Theoretically, this is faster than software and more reliable. And its true…at $1000 a card. Really, anything less and you’re almost certainly doing software raid anyway with the disadvantages of hardware. Namely, if the raid controller dies, and you don’t have the exact same model number and firmware revision, you’re screwed. Turns out that every single card out there stores the raid metadata in a slightly different way and if you can’t find the same card, your data is gone forever. Linux software raid doesn’t have this problem. Just pop in a bootable cd and you have full reliable access to your data. Move your hard drives to a new system/controller? No problem! It all works fine.

The other standard argument for hardware over software is performance. Again its true…at $1000 a card. Its an xor operation. Nothing more, ever. Ok, if you’re main cpu is a pentium II underclocked to 5 mhz then yeah, software might slow things down. At the end of this post i’ll include some info on the overhead i’ve experienced with it.

So, the actual process in linux of creating a raid array. It turns out that, unlike ndiswrapper or sendmail configuration *shudder*, its incredibly easy. Really, just make sure you have your hard drives in, (and partitioned), and then do one simple command like the following and its done.

sudo mdadm --create /dev/md0  --chunk=16 --level=5 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Now, before we go any further, lets talk about partitioning. Strictly speaking, its not necessary. Everything will work just as well on a bare drive as on one thats been partitioned. The main reason you’d want to, and the reason i recommend you do, is because not all drives are created equally. When you create a raid, by default, the smallest drive sets the size of all the other drives in the array. So, if you have a 200 gig drive and 2 500′s, the net result will be an array of 3 200′s. The real danger is, drives fail. Often. I can’t even count the number of dead hard drive’s i’ve personally seen working for LCSD2. And if you made an array of 3 500 gig drives and one dies, the replacement has to be at least as big or it won’t work. If the replacement is even 1 byte off, you cannot add it to the array. Now you’ve just wasted money on a drive you can’t use and you run the risk of another drive failure and loss of all your data before you can obtain another (in a raid5). So, i recommend that you partition each drive so that the partition on each is the same size but at least a couple of megs less than the total drive capacity. Sure you lose 20 megs but you gain peace of mind :P.

Ok, back to creating the array. Lets break that command down a little.

mdadm

This is the command. This is the one, (and only one), command that you will be using to manage your array. This is the one that create the array, assemble the array, start the array, stop the array, grow the array, replace/remove drives in the array, and finally, destroy the array (note: be careful with that part).

--create /dev/md0

This command specifies the device that will be your array. This is the one that you will format and mount and add files to. We call it md0 (multi disk number 0). This is mostly out of tradition. For example, i could’ve said:

--create /dev/jessesreallybigfreakingraidarraywithlotsofstoragespace

Yeah, i recommend not doing that and just going with md0, md1, etc.

--chunk=16

Ok, chunk size. Basically, in a raid5, data is striped across all the disks in the array. Chunk size, i’m pretty sure, specifies the size of that stripe before it will be written and needs to be a multiple of 4 (2?). Theoretically (and yeah, empirically), this has an effect on the speed and performance of the array. I would advise against obsessing here and just using 16. I did a lot of research and found only contradictory information. e.g Anything over 128 is wasteful! Or Anything under 128 is a waste! I’ll include links in my references section at the bottom if you really want to obsess. In my testing 16 was basically as good as most anything else. Also remember, you chose raid5 for reliability, not necessarily speed.

--level=5

This is where you specify what kind of raid you want. Me, i wanted raid5. You can specify any raid level you want here. I won’t spend time describing each raid level, for that, wikipedia is your friend. I will mention that chunk size has a different effect on raid0 so i would recommend reading the man pages.

--raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Pretty self explanatory. Here you specify how many total devices are in this array and then list them. Notice how all of them specify a partition?

Now you’re array is created. I can’t speak for the others but, if you chose raid5, the array is created with one drive “missing” and it then “rebuilds” itself. It does this for performance reasons (and yes, it really does). The array will be a little slower till its done rebuilding but its fine. Go ahead an use it all you want (or be paranoid and wait, whatever you have time for). It took 3-4 hours for it to completely rebuild for my array. Next step, formatting :).

mkfs.ext3 /dev/md0

There, done. Its formatted. Next.
Ok, fine. There are a few little things you can do here to make things a little more efficient. Also, you don’t have to use ext3 as the filesystem if you don’t want to. With ext3, you can specify things like stride length and width which will try to optimize things to match the block size, etc. to try and make it so that fewer read/write operations are needed. As with block size, i didn’t bother. In my (limited) testing, i found it to not make enough of a difference to warrant my doing anything. If you want to try, i have links below.

Ok, now make a mount point somewhere, for example:

mkdir /mnt/raid

And add the device to your fstab so that its mounted automatically at boot.

/dev/md0    /mnt/raid     ext3    defaults,errors=remount-ro,noatime,noexec,nodev    0    2

One thing to note here. In ubuntu, the mere installation of mdadm causes a new init ramdisk to be created, etc. and basically, it automatically scans for arrays on boot. If you’re distro doesn’t do that, you need to figure out to make or add the following to your startup scripts.

mdadm --assemble --scan

Also, you’ll probably have problems auto-mounting because your array may not be available when the fstab is processed. Let me know if you have problems and i’ll try to help.

  • Tips

The whole point of raid5 is redundancy. The idea being that the loss of a single drive means nothing because you can still get to your data and you can replace the drive. However, if you are one drive short on boot, the array won’t be automatically started (at least on ubuntu, i’m not sure this is standard behavior). To force an array to start sans a drive, do the following

sudo mdadm -R /dev/md0

Now you can backup data (highly recommended if possible), shutdown, pop the bad drive out, put the shiny new drive in, and add it to the array.

sudo mdadm --add /dev/md0 /dev/sde1

Reconstruction begins automatically. You can follow the progress by check /proc/mdstat periodically.

cat /proc/mdstat

This will take a long time, especially if you just let it go. By default, it limits the speed of the reconstruction to leave room for normal operation. If you have a mostly idle array or are just impatient, you can increase the speed limit. The speed limit is stored in “proc” virtual filesystem.

cat /proc/sys/dev/raid/speed_limit_min

By default, the speed limit is 1000 kb/s. Thats very slow, not even a megabyte a second. On a large array, it could take days at that speed. To increase the speed, just overwrite the file (as root, sudo won’t work here).

sudo -s
echo 15000 > /proc/sys/dev/raid/speed_limit_min

This increases the minimum speed limit to around 15 mb/s. Much better. Keep in mind that just because you make it that high, it might not reach the speed. It is reading and writing constantly at the same time to rebuild that drive and can only go as fast as that drive (and bus) can handle. That said, this is safe. There should be no danger in increasing the speed and a simple reboot will return it defaults. Also, it should be mentioned that a clean reboot (not a power failure probably) is fine and reconstruction will resume when it comes back up.

Now its been a while and we’ve ran out of space on our array. We don’t want to start over with bigger drives but we have space for one more. All we have to do is tell the array it has another drive and should “grow” to encompass it. No problem.

sudo mdadm --add /dev/md0 /dev/sdf1
sudo mdadm --grow /dev/md0 --raid-devices=5

This will take a while and you may want to use the speed limit trick above to speed things up. Now the array is bigger but theres a problem, you still don’t see the extra space. The reason is the filesystem is still the size of the old array. You need to resize the filesystem to accommodate the extra space. This is also easy.

sudo e2fsck -f /dev/md0
sudo resize2fs -p /dev/md0

Resize2fs complains if you don’t do a file system check first so thats the reason for that. This will only take a minute or two and you will have the full space available to you. Technically its possible to do this online (while the filesystem is mounted) but i recommend unmounting it first. This command also only works on ext3 filesystems so you’ll need to find the appropriate command for your file system of choice.

If you ever want to check the status of your array, you can just look at the mdstat file.

cat /proc/mdstat

This will show you the arrays it knows about and their status. For more information you can use mdadm

mdadm -D /dev/md0

You can also use mdadm to monitor the array.

mdadm --monitor

This command starts mdadm and forks off a daemon that keeps an eye on the arrays. If an array changes its state say, when a drive dies, mdadm will send off an email about it. The configuration file is in /etc/mdadm/mdadm.conf. Ubuntu by default has this running when its installed and will send an email to root when something happens.

I’ve had a really good experience with my array. I’ve tested it thoroughly and fully trust it. Its rock solid. I’d like to know about your experiences, leave a message or a link in the comments. Also if you need any help, i’d be happy to try.

  • References

to be added.

Related Posts