A few questions regarding RAID5/RAID6 recovery

Hi all,

Since this is my first post here, let me first thank all developers for=
their great tool. It really is a wonderfull piece of soft. ;)

I heard a lot of horror stories about the event, when a member of a rai=
d5/6 array gets kicked off due to I/O errors, and then, after the repla=
cement and during the recostruction, another drive fails, and the array=
become unusable. (For raid6, add another drive to the story, and the p=
roblem is the same, so let=E2=80=99s just talk about raid5 now). I want=
to prepare myself for this kind of unlucky event, and build up a strat=
egy that I can follow once it happens. (I hope never, but...)

Let=E2=80=99s assume we have a 4 drives RAID5, that has been degraded, =
the failed drive has been replaced, then the rebuild process failed, an=
d now we have an array with 2 good disks, one failed disk and one which=
is partially synchronized (the new one). And, we also have the disk ou=
t of the array, which was originally failed. If I assume, that both of =
the failed disks have some bad sectors but otherwise both are in an ope=
rative condition (can be dd-ed for example), then, except the unlikely =
event, when both disks have failed on the very same physical sector (ch=
unk?), then theoretically the data is there and could be retrieved. So =
my question is, can we retrieve them by using mdadm and some =E2=80=9Et=
ricks=E2=80=9D? I think of something like this:

1. I assemble (or --create --assume-clean) the array in degraded mode u=
sing the 2 good drives, and one of the 2 failed drives which has it's b=
ad sectors behind the point than the other failed drive.
2. Add the new drive, let the array start rebuilding, and wait for the =
process go beyond the point where the other failed drive has it's bad s=
ectors.
3. Stop/pause/??? the rebuild process. And - if possible - make a note =
of the exact sector (chunk) where the rebuild has been paused.
4. Assemble (or --create --assume-clean) the array again, but this time=
using the other failed drive,
5. Add the new drive again, and continue to rebuild from the point wher=
e the last rebuild has been paused. Since we are over the point where t=
he failed disk has it's bad sectors, the rebuild should finish fine.
6. Finally remove the failed disk and replace it with another new drive=
=2E

Can this be done using mdadm somehow?

My next question is not really a question but rather a wish. In my poin=
t of view, the above written situation is by far the biggest weekness o=
f not just linux software raid but all other harware raid solutions tha=
t i know of (don't know many, though). Even nowadays, when we use large=
r and largers disks. So i'm wondering if there is any raid or raid-kind=
solution that - along with redundancy, - provides some automatic stipe=
(chunk) reallocation feature? Something like modern hard disks do with=
their "reallocated sectors", something like: the raid driver reserves =
some chunks/stripes for "reallocation", and once an I/O error happens o=
n any of the active/working chunks, then instead of kicking the disk of=
f, it marks the stripe/chunk bad, and moves the data to one of the rese=
rved ones, and continues (along with some warning of course). Only, if =
writing to the reserved chunk fails, would be necessary to immediately =
kick the member off.

The other thing I wonder is why raid solutions (that i know of) use the=
"first remove the failed, then add the new" strategy instead of "add t=
he new, I try to recover, then remove the failed" strategy. They use th=
e former even when a spare drive is available, because -as far as i kno=
w - they won't utilize the failed disk for rebuild. Why? By using the l=
atter strategy, it would be a joy to recover from situations like above=
=2E

Thanks for your response.

Best regards,
Peter



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Peter [ Mo, 25 April 2011 19:47 ] [ ID #2058715 ]

Re: A few questions regarding RAID5/RAID6 recovery

2011/4/25 K=C5=91v=C3=A1ri P=C3=A9ter <peter [at] kovari.priv.hu>:
> Hi all,
>
> Since this is my first post here, let me first thank all developers f=
or their great tool. It really is a wonderfull piece of soft. ;)
>
> I heard a lot of horror stories about the event, when a member of a r=
aid5/6 array gets kicked off due to I/O errors, and then, after the rep=
lacement and during the recostruction, another drive fails, and the arr=
ay become unusable. (For raid6, add another drive to the story, and the=
problem is the same, so let=E2=80=99s just talk about raid5 now). I wa=
nt to prepare myself for this kind of unlucky event, and build up a str=
ategy that I can follow once it happens. (I hope never, but...)

=46rom what I understand If you run weekly raid scrubs you will limit
the possibility of this happening. CentOS / RedHat already have this
scheduled. If not you can add a cron job to call check or repair. Make
sure you replace DEV with the device.

echo check > /sys/block/DEV/md/sync_action

I have had 3 x 1TB drives in RAID 5 for the past 2.5 years. I have not
had a drive kicked out or an error found. If an error is found, since
it is caught early, I should have a good probability of replacing the
failed drive without incurring another error.

Ryan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryan Wagoner [ Mo, 25 April 2011 21:51 ] [ ID #2058716 ]

Re: A few questions regarding RAID5/RAID6 recovery

On 25/04/2011 19:47, K=C5=91v=C3=A1ri P=C3=A9ter wrote:
> Hi all,
>
> Since this is my first post here, let me first thank all developers
> for their great tool. It really is a wonderfull piece of soft. ;)
>
> I heard a lot of horror stories about the event, when a member of a
> raid5/6 array gets kicked off due to I/O errors, and then, after the
> replacement and during the recostruction, another drive fails, and
> the array become unusable. (For raid6, add another drive to the
> story, and the problem is the same, so let=E2=80=99s just talk about =
raid5
> now). I want to prepare myself for this kind of unlucky event, and
> build up a strategy that I can follow once it happens. (I hope never,
> but...)
>
> Let=E2=80=99s assume we have a 4 drives RAID5, that has been degraded=
, the
> failed drive has been replaced, then the rebuild process failed, and
> now we have an array with 2 good disks, one failed disk and one which
> is partially synchronized (the new one). And, we also have the disk
> out of the array, which was originally failed. If I assume, that both
> of the failed disks have some bad sectors but otherwise both are in
> an operative condition (can be dd-ed for example), then, except the
> unlikely event, when both disks have failed on the very same physical
> sector (chunk?), then theoretically the data is there and could be
> retrieved. So my question is, can we retrieve them by using mdadm and
> some =E2=80=9Etricks=E2=80=9D? I think of something like this:
>
> 1. I assemble (or --create --assume-clean) the array in degraded mode
> using the 2 good drives, and one of the 2 failed drives which has
> it's bad sectors behind the point than the other failed drive. 2. Add
> the new drive, let the array start rebuilding, and wait for the
> process go beyond the point where the other failed drive has it's bad
> sectors. 3. Stop/pause/??? the rebuild process. And - if possible -
> make a note of the exact sector (chunk) where the rebuild has been
> paused. 4. Assemble (or --create --assume-clean) the array again, but
> this time using the other failed drive, 5. Add the new drive again,
> and continue to rebuild from the point where the last rebuild has
> been paused. Since we are over the point where the failed disk has
> it's bad sectors, the rebuild should finish fine. 6. Finally remove
> the failed disk and replace it with another new drive.
>
> Can this be done using mdadm somehow?
>
> My next question is not really a question but rather a wish. In my
> point of view, the above written situation is by far the biggest
> weekness of not just linux software raid but all other harware raid
> solutions that i know of (don't know many, though). Even nowadays,
> when we use larger and largers disks. So i'm wondering if there is
> any raid or raid-kind solution that - along with redundancy, -
> provides some automatic stipe (chunk) reallocation feature? Something
> like modern hard disks do with their "reallocated sectors", something
> like: the raid driver reserves some chunks/stripes for
> "reallocation", and once an I/O error happens on any of the
> active/working chunks, then instead of kicking the disk off, it marks
> the stripe/chunk bad, and moves the data to one of the reserved ones,
> and continues (along with some warning of course). Only, if writing
> to the reserved chunk fails, would be necessary to immediately kick
> the member off.
>
> The other thing I wonder is why raid solutions (that i know of) use
> the "first remove the failed, then add the new" strategy instead of
> "add the new, I try to recover, then remove the failed" strategy.
> They use the former even when a spare drive is available, because -as
> far as i know - they won't utilize the failed disk for rebuild. Why?
> By using the latter strategy, it would be a joy to recover from
> situations like above.
>
> Thanks for your response.
>
> Best regards, Peter
>

You are not alone in these concerns. A couple of months ago there was =
a
long thread here about a roadmap for md raid. The first two entries ar=
e
a "bad block log" to allow reading of good blocks from a failing disk,
and "hot replace" to sync a replacement disk before removing the failin=
g
one. Being on a roadmap doesn't mean that these features will make it
to md raid in the near future - but it does mean that there are already=

rough plans to solve these problems.

<http://neil.brown.name/blog/20110216044002>



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
David Brown [ Di, 26 April 2011 09:21 ] [ ID #2058758 ]

RE: A few questions regarding RAID5/RAID6 recovery

-----Original Message-----
From: linux-raid-owner [at] vger.kernel.org [mailto:linux-raid-owner [at] vger.kernel.org] On Behalf Of David Brown
Sent: Tuesday, April 26, 2011 9:22 AM
To: linux-raid [at] vger.kernel.org
Subject: Re: A few questions regarding RAID5/RAID6 recovery

> You are not alone in these concerns. A couple of months ago there was a
> long thread here about a roadmap for md raid. The first two entries are
> a "bad block log" to allow reading of good blocks from a failing disk,
> and "hot replace" to sync a replacement disk before removing the failing
> one. Being on a roadmap doesn't mean that these features will make it
> to md raid in the near future - but it does mean that there are already
> rough plans to solve these problems.

> <http://neil.brown.name/blog/20110216044002>

Thank you David, this explains a lot. I hope we'll see this some day implemented.

Can you comment on my first question too please? Basically i'm just curious to know if there is a way to stop and restart the rebuilding process (and change/re-create the array in between them).

Btw, i read somewhere, that "--stop", then "--create --assume-clean" on an array works only on v0.9 superblocks, because v1.x overwrites existing data during create. It doesn't make sense for me - but i'm not sure -, so is this true? If not, then is it enough to use the same suberblock version for "--create" to make this work without data loss?

Thanks,
Peter


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Peter Kovari [ Di, 26 April 2011 18:13 ] [ ID #2058759 ]
Linux » gmane.linux.raid » A few questions regarding RAID5/RAID6 recovery

Vorheriges Thema: mdadm 3.2.1 doesn't compile with gcc 4.6.0
Nächstes Thema: Mdadm, udev and fakeraid?