Sec, blogmal!
06 2010



April '18



Wed, 30 Jun 2010

RAID5 recovery interlude

Last time we discussed the block order of linux-mdraid.

How do you find out which block order your RAID has?

The simplest way requires a working RAID to test against. (Ray created a small (50MB) test-RAID for that). First we get the first few blocks from each raw disk:

for $disk in sda1 sdb1 sdc1 sdd1 ;do
  for $nr in 0 1 2 3 4;do
    dd if=/dev/$disk skip=$nr count=1 bs=16k of=B.$disk.$nr
dd if=/dev/md0 bs=16k count=20 of=RAID

Note that this assumes your stripe-size is 16k. If you know it is different, change it, if not you will find out later and have to retry with an adjusted value.

Now try to match up the first block with the RAID contents like this:

cat B.sda1.0 | cmp - RAID

If the block matches, you will get:

cmp: EOF on stdin

If the block is the wrong one:

stdin RAID differ: char 1, line 1

If none of your first blocks (the files ending with .0) match, either your block-size is too big (try again with half the previous size) or your RAID prefixes the disks with some internal bookkeeping info (in that case you can try to start with later blocks)

Now try to match the next blocks by adding then one by one to the cat command line like this:

cat B.sda1.0 B.sdb1.0 | cmp - RAID

that way you will easily recover the block allocation order of your RAID.

For example our linux-mdraid starts like this:

cat B.sda1.0 B.sdb1.0 B.sdc1.0 B.sdd1.1 B.sda1.1 ...

After that, its only a two-line patch to raidextract to fix that – hope you know C ;-)

That concludes our intermission for today. Tomorrow we will see why all this work wasn't even necessary.

– Sec

posted at: 21:06 | Category: /tidbits | permanent link to this entry | 0 comments (trackback)
RAID5 recovery (Part II)

We left our heroes yesterday with a broken RAID 5 due to read errors on multiple disks. (Read part I: here)

A great starting point is raidextract by Peter Benie which attempts to re-assemble a RAID5. His web-page on this tool also serves as a great overview in the inner workings of standard RAID 5.

The first problem we stumbled upon is the fact that it assumes a certain pattern of the Parity blocks. (All examples from here on assume 4 disks, since that is what we had. But of course that's all applicable to any number of disks)

Adapting the example from his page:

D1: P 3 6 9 P151821...
D2: 0 P 71012 P1922...
D3: 1 4 P111316 P23...
D4: 2 5 8 P141720 P...

Our Linux-mdraid unfortunately didn't conform to this expectation. Not only does it start with parity on the last disk (which raidextract would support with --rotate), but it also moves the parity block 'backward' instead of 'forward'.

The correct allocation order looks like this:

D1: 0 4 8 P121620 P...
D2: 1 5 P 91317 P21...
D3: 2 P 61014 P1822...
D4: P 3 711 P151923...

A quick&dirty hack to raidextract to implement this order:

--- raidextract.c	2008-07-26 11:33:53.000000000 +0200
+++ raidextract-new.c	2010-06-28 13:49:54.000000000 +0200
@@ -316,8 +316,10 @@
 	int paritydisk=(stripe / (disks-1) + rotate) % disks;
 	int len=stripesize-offset;
 	int bytes;
+	int ndisk;
 	char *ptr;
+	ndisk=(disk-paritydisk+3)%disks;
 	if (!noparity && paritydisk <= disk) disk++;
 	if (len>raidlen) len=raidlen;
 	if (winoffset+len > datasize) len=datasize-winoffset;
@@ -337,7 +339,7 @@
-	ptr=window[disk][windowalt]+winoffset;
+	ptr=window[ndisk][windowalt]+winoffset;
 	while (len)
 		bytes=write(STDOUT_FILENO, ptr, len);

A run on an error-free test-RAID confirms this and extracted it correctly. Yay!

continued in part III, coming soon

– Sec

posted at: 02:26 | Category: /tidbits | permanent link to this entry | 0 comments (trackback)

Mon, 28 Jun 2010

RAID5 recovery (Part I)

The Munich CCC fileserver uses (as many other servers) software RAID 5 amongst its disks. We all (should) know that RAIDs are no substitute for backups, which was reinforced by a recent problem we had. While RAID level 5 can recover gracefully from a single failed disk, it generally can't cope with multiple failed disks at the same time.

One of the problems with large harddisks is, that there may be yet undetected errors on it, just because you haven't attempted to read that part for quite some time. Now when you start a rebuild of a RAID5, these errors quickly pop due to the rebuild process needing to read all the data. This is the main reason why you should regularly run complete surface scans on your RAID arrays.

Almost all RAID implementations tend to mark a whole disk as failed as soon as it contains a single error. This becomes a problem as soon as you detect a second error on your currently degraded RAID you are just attempting to rebuild.

Fortunately there is still hope. If the errors on your failing disks occur on non-overlapping points of the array, you can recover a complete copy of your data by assembling just the right pieces. But unfortunately there appears to be no hardware or software RAID solution able to do that out of the box. So we're left to try this manually.

more on this saga in part II, coming soon…

– Sec

posted at: 19:23 | Category: /tidbits | permanent link to this entry | 0 comments (trackback)

Mon, 21 Jun 2010

Filmfest München 2010

Huch, schon steht es vor der Tür – Das diesjährige FilmFest.

Dieses Jahr sinds doch wieder 24 Filme geworden, obwohl «Tod in Istanbul» leider schon ausverkauft war. Mit von der Partie sind diesmal 3 Mitternachtsfilme, das wird vermutlich ein wenig anstrengend, dieses Jahr %-)

Wenn jemand von euch in einen gleichen Film geht kann er sich ja melden ;-)

vollständige Liste...
posted at: 11:58 | Category: /misc | permanent link to this entry | 1 comment (trackback)

<< older

powered by blosxom
in 0.00 s