raid and sleeping bad sectors

raid and sleeping bad sectors

am 29.06.2004 12:48:15 von Dieter Stueken

Question:

Under which conditions a disk of a raid-5 system gets off line?
Does it happen on ANY error, even if some read error happened?
Will double-fault read errors on different disks destroy my
data?

long story:

I manage about 1TB of data on IDE disk and learned
a lot about different kinds of disk failures.
=46ortunately I suffered no data loss so far, as I completely
mirror all data each night (kind of manual raid-1 :-)
I think about using raid-5 now.

My observation was: a sudden total loss of a whole disk
was very unlikely. If you monitor the disk carefully using
its internal SMART capabilities, you are able to copy the
data and replace the disk long time before it finally dies.

see: http://smartmontools.sourceforge.net/

What happens frequently are spontaneous bad sectors, which
can not be read any more (i.e. CRC errors). Most people
think bad sectors are handled automatically by the firmware
of your HD. Unfortunately this is not the whole truth.
Instead of, a bad sector is indicated as bad, until it gets
explicitly rewritten by some new data. At this point, the
HD-firmware may decide to store the new data using a spare
sector instead. The bad news are: sectors turn to become
bad/unreadable quite spontaneously, even if they could be
read successfully short time before!

You may ask, why this is a problem for a raid-5 system?
It is especially designed to handle such problems!
What makes me worry is, that those errors occur spontaneously
and without any notice possibly on several disks simultaneously.
You may detect such a problems only by a complete scan of
all sectors of your disk. The critical question is: what
happens, if the first bad sector on some disk get read.
Does this event kick off that disk from the system?
You may think its a good idea, to kick off the disk as
soon as possible. I think, this may be bad, as it dramatically
decreases the reliability of your remaining system, especially
if you have some other sleeping bad sector on any other disk, too.
At least when you try to rebuild your system, you run into
trouble.

There are several possible solutions. (May be raid systems already
works this way, but I have no experience so far, and I could not
find too much about this in the FAQ or mailing-list)

1) I think a disk should be kept online as long as possible.
This means, that a simple read error should not deactivate the disk
as long the disk can be successfully written to and thus is still in
sync. As long, as "simple" read errors (even on different disks) occur,
my data is still reliable, as it is very unlikely, that two disk fail
with the SAME logical sector number. But it IS likely, that two disk
carry some sleeping bad sectors simultaneously.

2) If I decide to replace a disk, it should be possible to add a new
disk to the system before degrading it. After I successfully build the
new disk, I may switch off the bad one. This way I'm save against multi
disk read errors all time.

example: array of the disks (A B C), want to replace B:

123456789 <- sector number
A aaaaaaaXa <- data on disk a, X =3D unreadable
B bbXbbbbbb <- disk b, will be replaced
C ccccXcccc

B' bbbbbbbbb <- new spare disk for b build from current (A,B,C)

3) If a disks happened to produce a bad sector, you may try to rewrite =
it
again, if you still have the data. Using Raid 2 or 5 this is possible, =
as
long as you don't have a double fault on exactly the same sector on any
other disks. For a raid-1/5 system this means it might cure itself!
I did such surgery manually already, and it works quite good.

Conclusion:

After a disk shows up with bad sectors, you should indeed think of repl=
acing
it as soon as possible, but it should not affect data integrity that mu=
ch.
Instead it should be kept alive as long as possible until any necessary=
recovery
took place.

Dieter.

--=20
Dieter Stüken, con terra GmbH, Münster
stueken@conterra.de
http://www.conterra.de/
(0)251-7474-501
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 29.06.2004 17:59:06 von bugzilla

You are correct. 1 bad sector on a read, the disk is kicked out!
I agree with you, it (md) should not do that! Your #3 is something I h=
ave
mentioned here a few times. I don't recall getting any comments!
I get 1 read error about every 3 months or so. I have 14 disks in a RA=
ID5
array. Every time I have been able to re-use the same disk. But it is=
a
pain in the @$$! And I worry about a second bad sector!!!

I do a read test of all of my disks each night! Hoping to catch an err=
or
before md does. Since a bad sector could go un-noticed for months! As=
far
as I know, md does not test the disks any! As far as I know, md does n=
ot
verify parity. As far as I know, md does not verify RAID1 data matches=
on
all disks.

You are also correct. 2 bad sectors on 2 different disks and "That's i=
t
man! Game over man! Game over!". You may want to consider RAID6. It w=
ill
allow 2 bad sectors, but not 3!! I have considered this myself. I hav=
e 14
disks with a spare. I should just go with a 15 disk RAID6.

I disagree with your conclusion: It is normal for a disk to grow bad
sectors. 1 or 2 bad sectors is not an issue. Maybe 10 or 100 is an is=
sue.
I don't know what the limit should be. I have maybe 5-10 bad sectors o=
n my
14 disks. In about 2 years I have not had a hard failure. I just corr=
ect
the bad sector by hand and re-use the disk. Maybe I should track which
disks have had bad sectors to determine if there is a pattern, but I do=
n't.
I think md should to this. I have said so here in the past.

Hardware RAID systems support bad sectors. Not sure they all do, but s=
ome
or most do. EMC counts them and when some limit is reached the disk is
copied to a spare. The "bad" disk is kept on-line. After all, it is s=
till
working. I automatic service call is placed to have the "bad" disk
replaced. I have been told HP's XP-256 does not have a bad sector limi=
t.
They just wait until a disk fails! Because of this EMC replaces disks =
more
often. Some see this a EMC has more failures. I don't see it this way=

They protect my data better. Getting off topic I think!....

Guy

-----Original Message-----
=46rom: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Dieter Stueken
Sent: Tuesday, June 29, 2004 6:48 AM
To: linux-raid@vger.kernel.org
Subject: raid and sleeping bad sectors

Question:

Under which conditions a disk of a raid-5 system gets off line?
Does it happen on ANY error, even if some read error happened?
Will double-fault read errors on different disks destroy my
data?

long story:

I manage about 1TB of data on IDE disk and learned
a lot about different kinds of disk failures.
=46ortunately I suffered no data loss so far, as I completely
mirror all data each night (kind of manual raid-1 :-)
I think about using raid-5 now.

My observation was: a sudden total loss of a whole disk
was very unlikely. If you monitor the disk carefully using
its internal SMART capabilities, you are able to copy the
data and replace the disk long time before it finally dies.

see: http://smartmontools.sourceforge.net/

What happens frequently are spontaneous bad sectors, which
can not be read any more (i.e. CRC errors). Most people
think bad sectors are handled automatically by the firmware
of your HD. Unfortunately this is not the whole truth.
Instead of, a bad sector is indicated as bad, until it gets
explicitly rewritten by some new data. At this point, the
HD-firmware may decide to store the new data using a spare
sector instead. The bad news are: sectors turn to become
bad/unreadable quite spontaneously, even if they could be
read successfully short time before!

You may ask, why this is a problem for a raid-5 system?
It is especially designed to handle such problems!
What makes me worry is, that those errors occur spontaneously
and without any notice possibly on several disks simultaneously.
You may detect such a problems only by a complete scan of
all sectors of your disk. The critical question is: what
happens, if the first bad sector on some disk get read.
Does this event kick off that disk from the system?
You may think its a good idea, to kick off the disk as
soon as possible. I think, this may be bad, as it dramatically
decreases the reliability of your remaining system, especially
if you have some other sleeping bad sector on any other disk, too.
At least when you try to rebuild your system, you run into
trouble.

There are several possible solutions. (May be raid systems already
works this way, but I have no experience so far, and I could not
find too much about this in the FAQ or mailing-list)

1) I think a disk should be kept online as long as possible.
This means, that a simple read error should not deactivate the disk
as long the disk can be successfully written to and thus is still in
sync. As long, as "simple" read errors (even on different disks) occur,
my data is still reliable, as it is very unlikely, that two disk fail
with the SAME logical sector number. But it IS likely, that two disk
carry some sleeping bad sectors simultaneously.

2) If I decide to replace a disk, it should be possible to add a new
disk to the system before degrading it. After I successfully build the
new disk, I may switch off the bad one. This way I'm save against multi
disk read errors all time.

example: array of the disks (A B C), want to replace B:

123456789 <- sector number
A aaaaaaaXa <- data on disk a, X =3D unreadable
B bbXbbbbbb <- disk b, will be replaced
C ccccXcccc

B' bbbbbbbbb <- new spare disk for b build from current (A,B,C)

3) If a disks happened to produce a bad sector, you may try to rewrite =
it
again, if you still have the data. Using Raid 2 or 5 this is possible, =
as
long as you don't have a double fault on exactly the same sector on any
other disks. For a raid-1/5 system this means it might cure itself!
I did such surgery manually already, and it works quite good.

Conclusion:

After a disk shows up with bad sectors, you should indeed think of repl=
acing
it as soon as possible, but it should not affect data integrity that mu=
ch.
Instead it should be kept alive as long as possible until any necessary
recovery
took place.

Dieter.

--=20
Dieter Stüken, con terra GmbH, Münster
stueken@conterra.de
http://www.conterra.de/
(0)251-7474-501
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 29.06.2004 18:30:08 von John Lange

Ok, my two cents; Why would a disk ever be automatically removed from an
array?

If a disk has errors the array should write a log message and then write
around the bad sectors. If you start to see a lot of error messages in
your logs then the sysadmin should make the judgment to replace the
drive.

Even if the drive completely fails it should not be automatically
removed from the array. md should continue to attempt to use it and keep
logging errors (perhaps a useful statistic for md to track is a running
percentage of disk accesses that resulted in an error?)

If a sysadmin (or more likely a monitoring script) sees an excessive
number of errors then it can take appropriate action. For example, if
there is a spare, then remove the bad drive and active the spare. If
there isn't a spare then keep the drive in the array but send out alerts
for a sysadmin to intervene.

I don't think automatically removing a drive does anything to protect
data, in fact it jeopardizes it since if two disks have errors its game
over. In essence it creates a very "fragile" array.

I certainly am not an expert but that is my view as a sysadmin.

Regards,

John Lange

On Tue, 2004-06-29 at 10:59, Guy wrote:
> You are correct. 1 bad sector on a read, the disk is kicked out!
> I agree with you, it (md) should not do that! Your #3 is something I have
> mentioned here a few times. I don't recall getting any comments!
> I get 1 read error about every 3 months or so. I have 14 disks in a RAID5
> array. Every time I have been able to re-use the same disk. But it is a
> pain in the @$$! And I worry about a second bad sector!!!
>
> I do a read test of all of my disks each night! Hoping to catch an error
> before md does. Since a bad sector could go un-noticed for months! As far
> as I know, md does not test the disks any! As far as I know, md does not
> verify parity. As far as I know, md does not verify RAID1 data matches on
> all disks.
>
> You are also correct. 2 bad sectors on 2 different disks and "That's it
> man! Game over man! Game over!". You may want to consider RAID6. It will
> allow 2 bad sectors, but not 3!! I have considered this myself. I have 14
> disks with a spare. I should just go with a 15 disk RAID6.
>
> I disagree with your conclusion: It is normal for a disk to grow bad
> sectors. 1 or 2 bad sectors is not an issue. Maybe 10 or 100 is an issue.
> I don't know what the limit should be. I have maybe 5-10 bad sectors on my
> 14 disks. In about 2 years I have not had a hard failure. I just correct
> the bad sector by hand and re-use the disk. Maybe I should track which
> disks have had bad sectors to determine if there is a pattern, but I don't.
> I think md should to this. I have said so here in the past.
>
> Hardware RAID systems support bad sectors. Not sure they all do, but some
> or most do. EMC counts them and when some limit is reached the disk is
> copied to a spare. The "bad" disk is kept on-line. After all, it is still
> working. I automatic service call is placed to have the "bad" disk
> replaced. I have been told HP's XP-256 does not have a bad sector limit.
> They just wait until a disk fails! Because of this EMC replaces disks more
> often. Some see this a EMC has more failures. I don't see it this way.
> They protect my data better. Getting off topic I think!....
>
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Dieter Stueken
> Sent: Tuesday, June 29, 2004 6:48 AM
> To: linux-raid@vger.kernel.org
> Subject: raid and sleeping bad sectors
>
> Question:
>
> Under which conditions a disk of a raid-5 system gets off line?
> Does it happen on ANY error, even if some read error happened?
> Will double-fault read errors on different disks destroy my
> data?
>
> long story:
>
> I manage about 1TB of data on IDE disk and learned
> a lot about different kinds of disk failures.
> Fortunately I suffered no data loss so far, as I completely
> mirror all data each night (kind of manual raid-1 :-)
> I think about using raid-5 now.
>
> My observation was: a sudden total loss of a whole disk
> was very unlikely. If you monitor the disk carefully using
> its internal SMART capabilities, you are able to copy the
> data and replace the disk long time before it finally dies.
>
> see: http://smartmontools.sourceforge.net/
>
> What happens frequently are spontaneous bad sectors, which
> can not be read any more (i.e. CRC errors). Most people
> think bad sectors are handled automatically by the firmware
> of your HD. Unfortunately this is not the whole truth.
> Instead of, a bad sector is indicated as bad, until it gets
> explicitly rewritten by some new data. At this point, the
> HD-firmware may decide to store the new data using a spare
> sector instead. The bad news are: sectors turn to become
> bad/unreadable quite spontaneously, even if they could be
> read successfully short time before!
>
> You may ask, why this is a problem for a raid-5 system?
> It is especially designed to handle such problems!
> What makes me worry is, that those errors occur spontaneously
> and without any notice possibly on several disks simultaneously.
> You may detect such a problems only by a complete scan of
> all sectors of your disk. The critical question is: what
> happens, if the first bad sector on some disk get read.
> Does this event kick off that disk from the system?
> You may think its a good idea, to kick off the disk as
> soon as possible. I think, this may be bad, as it dramatically
> decreases the reliability of your remaining system, especially
> if you have some other sleeping bad sector on any other disk, too.
> At least when you try to rebuild your system, you run into
> trouble.
>
> There are several possible solutions. (May be raid systems already
> works this way, but I have no experience so far, and I could not
> find too much about this in the FAQ or mailing-list)
>
> 1) I think a disk should be kept online as long as possible.
> This means, that a simple read error should not deactivate the disk
> as long the disk can be successfully written to and thus is still in
> sync. As long, as "simple" read errors (even on different disks) occur,
> my data is still reliable, as it is very unlikely, that two disk fail
> with the SAME logical sector number. But it IS likely, that two disk
> carry some sleeping bad sectors simultaneously.
>
> 2) If I decide to replace a disk, it should be possible to add a new
> disk to the system before degrading it. After I successfully build the
> new disk, I may switch off the bad one. This way I'm save against multi
> disk read errors all time.
>
> example: array of the disks (A B C), want to replace B:
>
> 123456789 <- sector number
> A aaaaaaaXa <- data on disk a, X = unreadable
> B bbXbbbbbb <- disk b, will be replaced
> C ccccXcccc
>
> B' bbbbbbbbb <- new spare disk for b build from current (A,B,C)
>
> 3) If a disks happened to produce a bad sector, you may try to rewrite it
> again, if you still have the data. Using Raid 2 or 5 this is possible, as
> long as you don't have a double fault on exactly the same sector on any
> other disks. For a raid-1/5 system this means it might cure itself!
> I did such surgery manually already, and it works quite good.
>
> Conclusion:
>
> After a disk shows up with bad sectors, you should indeed think of replacing
> it as soon as possible, but it should not affect data integrity that much.
> Instead it should be kept alive as long as possible until any necessary
> recovery
> took place.
>
> Dieter.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid and sleeping bad sectors

am 29.06.2004 19:37:54 von dean gaudet

On Tue, 29 Jun 2004, Dieter Stueken wrote:

> Question:
>
> Under which conditions a disk of a raid-5 system gets off line?
> Does it happen on ANY error, even if some read error happened?

yup -- md supports only "all good" or "all bad" as status for each disk --
no "partially good" status.

> Will double-fault read errors on different disks destroy my
> data?

a double fault will take your array offline but it's typically not a
catastrophic dataloss event. you will have to do manual recovery --
you'll need to do the XOR to recreate at least one of the read errors, and
use mdadm to re-assemble the array.


> 1) I think a disk should be kept online as long as possible.
> This means, that a simple read error should not deactivate the disk
....
> 2) If I decide to replace a disk, it should be possible to add a new
> disk to the system before degrading it.
....
> 3) If a disks happened to produce a bad sector, you may try to rewrite it

i've been maintaining a wishlist, and those 3 are already on it:


-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 29.06.2004 20:03:56 von bugzilla

I did not intend to indicate that I hate software raid and like hardwar=
e
raid. I like and hate both. :) I use both! I have been involved wit=
h a
double failure on an HP FC-60. A disk failed hard, during recovery ano=
ther
disk had a bad sector. They (HP) were not able to save any of the data=
!
The FC-60s were lemons anyway!

Guy

-----Original Message-----
=46rom: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Guy
Sent: Tuesday, June 29, 2004 11:59 AM
To: 'Dieter Stueken'; linux-raid@vger.kernel.org
Subject: RE: raid and sleeping bad sectors

You are correct. 1 bad sector on a read, the disk is kicked out!
I agree with you, it (md) should not do that! Your #3 is something I h=
ave
mentioned here a few times. I don't recall getting any comments!
I get 1 read error about every 3 months or so. I have 14 disks in a RA=
ID5
array. Every time I have been able to re-use the same disk. But it is=
a
pain in the @$$! And I worry about a second bad sector!!!

I do a read test of all of my disks each night! Hoping to catch an err=
or
before md does. Since a bad sector could go un-noticed for months! As=
far
as I know, md does not test the disks any! As far as I know, md does n=
ot
verify parity. As far as I know, md does not verify RAID1 data matches=
on
all disks.

You are also correct. 2 bad sectors on 2 different disks and "That's i=
t
man! Game over man! Game over!". You may want to consider RAID6. It w=
ill
allow 2 bad sectors, but not 3!! I have considered this myself. I hav=
e 14
disks with a spare. I should just go with a 15 disk RAID6.

I disagree with your conclusion: It is normal for a disk to grow bad
sectors. 1 or 2 bad sectors is not an issue. Maybe 10 or 100 is an is=
sue.
I don't know what the limit should be. I have maybe 5-10 bad sectors o=
n my
14 disks. In about 2 years I have not had a hard failure. I just corr=
ect
the bad sector by hand and re-use the disk. Maybe I should track which
disks have had bad sectors to determine if there is a pattern, but I do=
n't.
I think md should to this. I have said so here in the past.

Hardware RAID systems support bad sectors. Not sure they all do, but s=
ome
or most do. EMC counts them and when some limit is reached the disk is
copied to a spare. The "bad" disk is kept on-line. After all, it is s=
till
working. I automatic service call is placed to have the "bad" disk
replaced. I have been told HP's XP-256 does not have a bad sector limi=
t.
They just wait until a disk fails! Because of this EMC replaces disks =
more
often. Some see this a EMC has more failures. I don't see it this way=

They protect my data better. Getting off topic I think!....

Guy

-----Original Message-----
=46rom: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Dieter Stueken
Sent: Tuesday, June 29, 2004 6:48 AM
To: linux-raid@vger.kernel.org
Subject: raid and sleeping bad sectors

Question:

Under which conditions a disk of a raid-5 system gets off line?
Does it happen on ANY error, even if some read error happened?
Will double-fault read errors on different disks destroy my
data?

long story:

I manage about 1TB of data on IDE disk and learned
a lot about different kinds of disk failures.
=46ortunately I suffered no data loss so far, as I completely
mirror all data each night (kind of manual raid-1 :-)
I think about using raid-5 now.

My observation was: a sudden total loss of a whole disk
was very unlikely. If you monitor the disk carefully using
its internal SMART capabilities, you are able to copy the
data and replace the disk long time before it finally dies.

see: http://smartmontools.sourceforge.net/

What happens frequently are spontaneous bad sectors, which
can not be read any more (i.e. CRC errors). Most people
think bad sectors are handled automatically by the firmware
of your HD. Unfortunately this is not the whole truth.
Instead of, a bad sector is indicated as bad, until it gets
explicitly rewritten by some new data. At this point, the
HD-firmware may decide to store the new data using a spare
sector instead. The bad news are: sectors turn to become
bad/unreadable quite spontaneously, even if they could be
read successfully short time before!

You may ask, why this is a problem for a raid-5 system?
It is especially designed to handle such problems!
What makes me worry is, that those errors occur spontaneously
and without any notice possibly on several disks simultaneously.
You may detect such a problems only by a complete scan of
all sectors of your disk. The critical question is: what
happens, if the first bad sector on some disk get read.
Does this event kick off that disk from the system?
You may think its a good idea, to kick off the disk as
soon as possible. I think, this may be bad, as it dramatically
decreases the reliability of your remaining system, especially
if you have some other sleeping bad sector on any other disk, too.
At least when you try to rebuild your system, you run into
trouble.

There are several possible solutions. (May be raid systems already
works this way, but I have no experience so far, and I could not
find too much about this in the FAQ or mailing-list)

1) I think a disk should be kept online as long as possible.
This means, that a simple read error should not deactivate the disk
as long the disk can be successfully written to and thus is still in
sync. As long, as "simple" read errors (even on different disks) occur,
my data is still reliable, as it is very unlikely, that two disk fail
with the SAME logical sector number. But it IS likely, that two disk
carry some sleeping bad sectors simultaneously.

2) If I decide to replace a disk, it should be possible to add a new
disk to the system before degrading it. After I successfully build the
new disk, I may switch off the bad one. This way I'm save against multi
disk read errors all time.

example: array of the disks (A B C), want to replace B:

123456789 <- sector number
A aaaaaaaXa <- data on disk a, X =3D unreadable
B bbXbbbbbb <- disk b, will be replaced
C ccccXcccc

B' bbbbbbbbb <- new spare disk for b build from current (A,B,C)

3) If a disks happened to produce a bad sector, you may try to rewrite =
it
again, if you still have the data. Using Raid 2 or 5 this is possible, =
as
long as you don't have a double fault on exactly the same sector on any
other disks. For a raid-1/5 system this means it might cure itself!
I did such surgery manually already, and it works quite good.

Conclusion:

After a disk shows up with bad sectors, you should indeed think of repl=
acing
it as soon as possible, but it should not affect data integrity that mu=
ch.
Instead it should be kept alive as long as possible until any necessary
recovery
took place.

Dieter.

--=20
Dieter Stüken, con terra GmbH, Münster
stueken@conterra.de
http://www.conterra.de/
(0)251-7474-501
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 29.06.2004 20:43:06 von Mike Tran

(Please note that I don't mean to advertise EVMS here :) just want to
mention that the functionality is available)

EVMS, (http://evms.sourceforge.net) provides a solution to this "bad
sectors" issue by having Bad Block Relocation (BBR) layer on the I/O
stack.

BBR enhances the reliability of a disk by remapping bad sectors. When
BBR is added to a disk, it reserves replacement sectors. BBR detects
failed WRITEs and remaps the I/O to the reserved sectors. BBR does not
remap failed READs. However, if anyone feels that it's necessary,
he/she can modify BBR code. My guess is about 10 lines of code. EVMS is
open source.

With EVMS, you can create MD arrays on top of BBR block devices. This
way RAID1/RAID5 will not see any I/O error on the "bad disk" as long as
BBR reserved sectors are available.

Obviously, the disadvantages of BBR are:

- Slow down I/O
- Disk space allocation for replacement sectors and mapping table


Regards,
Mike T.

On Tue, 2004-06-29 at 11:30, John Lange wrote:
> Ok, my two cents; Why would a disk ever be automatically removed from an
> array?
>
> If a disk has errors the array should write a log message and then write
> around the bad sectors. If you start to see a lot of error messages in
> your logs then the sysadmin should make the judgment to replace the
> drive.
>
> Even if the drive completely fails it should not be automatically
> removed from the array. md should continue to attempt to use it and keep
> logging errors (perhaps a useful statistic for md to track is a running
> percentage of disk accesses that resulted in an error?)
>
> If a sysadmin (or more likely a monitoring script) sees an excessive
> number of errors then it can take appropriate action. For example, if
> there is a spare, then remove the bad drive and active the spare. If
> there isn't a spare then keep the drive in the array but send out alerts
> for a sysadmin to intervene.
>
> I don't think automatically removing a drive does anything to protect
> data, in fact it jeopardizes it since if two disks have errors its game
> over. In essence it creates a very "fragile" array.
>
> I certainly am not an expert but that is my view as a sysadmin.
>
> Regards,
>
> John Lange
>
> On Tue, 2004-06-29 at 10:59, Guy wrote:
> > You are correct. 1 bad sector on a read, the disk is kicked out!
> > I agree with you, it (md) should not do that! Your #3 is something I have
> > mentioned here a few times. I don't recall getting any comments!
> > I get 1 read error about every 3 months or so. I have 14 disks in a RAID5
> > array. Every time I have been able to re-use the same disk. But it is a
> > pain in the @$$! And I worry about a second bad sector!!!
> >
> > I do a read test of all of my disks each night! Hoping to catch an error
> > before md does. Since a bad sector could go un-noticed for months! As far
> > as I know, md does not test the disks any! As far as I know, md does not
> > verify parity. As far as I know, md does not verify RAID1 data matches on
> > all disks.
> >
> > You are also correct. 2 bad sectors on 2 different disks and "That's it
> > man! Game over man! Game over!". You may want to consider RAID6. It will
> > allow 2 bad sectors, but not 3!! I have considered this myself. I have 14
> > disks with a spare. I should just go with a 15 disk RAID6.
> >
> > I disagree with your conclusion: It is normal for a disk to grow bad
> > sectors. 1 or 2 bad sectors is not an issue. Maybe 10 or 100 is an issue.
> > I don't know what the limit should be. I have maybe 5-10 bad sectors on my
> > 14 disks. In about 2 years I have not had a hard failure. I just correct
> > the bad sector by hand and re-use the disk. Maybe I should track which
> > disks have had bad sectors to determine if there is a pattern, but I don't.
> > I think md should to this. I have said so here in the past.
> >
> > Hardware RAID systems support bad sectors. Not sure they all do, but some
> > or most do. EMC counts them and when some limit is reached the disk is
> > copied to a spare. The "bad" disk is kept on-line. After all, it is still
> > working. I automatic service call is placed to have the "bad" disk
> > replaced. I have been told HP's XP-256 does not have a bad sector limit.
> > They just wait until a disk fails! Because of this EMC replaces disks more
> > often. Some see this a EMC has more failures. I don't see it this way.
> > They protect my data better. Getting off topic I think!....
> >
> > Guy
> >
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org
> > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Dieter Stueken
> > Sent: Tuesday, June 29, 2004 6:48 AM
> > To: linux-raid@vger.kernel.org
> > Subject: raid and sleeping bad sectors
> >
> > Question:
> >
> > Under which conditions a disk of a raid-5 system gets off line?
> > Does it happen on ANY error, even if some read error happened?
> > Will double-fault read errors on different disks destroy my
> > data?
> >
> > long story:
> >
> > I manage about 1TB of data on IDE disk and learned
> > a lot about different kinds of disk failures.
> > Fortunately I suffered no data loss so far, as I completely
> > mirror all data each night (kind of manual raid-1 :-)
> > I think about using raid-5 now.
> >
> > My observation was: a sudden total loss of a whole disk
> > was very unlikely. If you monitor the disk carefully using
> > its internal SMART capabilities, you are able to copy the
> > data and replace the disk long time before it finally dies.
> >
> > see: http://smartmontools.sourceforge.net/
> >
> > What happens frequently are spontaneous bad sectors, which
> > can not be read any more (i.e. CRC errors). Most people
> > think bad sectors are handled automatically by the firmware
> > of your HD. Unfortunately this is not the whole truth.
> > Instead of, a bad sector is indicated as bad, until it gets
> > explicitly rewritten by some new data. At this point, the
> > HD-firmware may decide to store the new data using a spare
> > sector instead. The bad news are: sectors turn to become
> > bad/unreadable quite spontaneously, even if they could be
> > read successfully short time before!
> >
> > You may ask, why this is a problem for a raid-5 system?
> > It is especially designed to handle such problems!
> > What makes me worry is, that those errors occur spontaneously
> > and without any notice possibly on several disks simultaneously.
> > You may detect such a problems only by a complete scan of
> > all sectors of your disk. The critical question is: what
> > happens, if the first bad sector on some disk get read.
> > Does this event kick off that disk from the system?
> > You may think its a good idea, to kick off the disk as
> > soon as possible. I think, this may be bad, as it dramatically
> > decreases the reliability of your remaining system, especially
> > if you have some other sleeping bad sector on any other disk, too.
> > At least when you try to rebuild your system, you run into
> > trouble.
> >
> > There are several possible solutions. (May be raid systems already
> > works this way, but I have no experience so far, and I could not
> > find too much about this in the FAQ or mailing-list)
> >
> > 1) I think a disk should be kept online as long as possible.
> > This means, that a simple read error should not deactivate the disk
> > as long the disk can be successfully written to and thus is still in
> > sync. As long, as "simple" read errors (even on different disks) occur,
> > my data is still reliable, as it is very unlikely, that two disk fail
> > with the SAME logical sector number. But it IS likely, that two disk
> > carry some sleeping bad sectors simultaneously.
> >
> > 2) If I decide to replace a disk, it should be possible to add a new
> > disk to the system before degrading it. After I successfully build the
> > new disk, I may switch off the bad one. This way I'm save against multi
> > disk read errors all time.
> >
> > example: array of the disks (A B C), want to replace B:
> >
> > 123456789 <- sector number
> > A aaaaaaaXa <- data on disk a, X = unreadable
> > B bbXbbbbbb <- disk b, will be replaced
> > C ccccXcccc
> >
> > B' bbbbbbbbb <- new spare disk for b build from current (A,B,C)
> >
> > 3) If a disks happened to produce a bad sector, you may try to rewrite it
> > again, if you still have the data. Using Raid 2 or 5 this is possible, as
> > long as you don't have a double fault on exactly the same sector on any
> > other disks. For a raid-1/5 system this means it might cure itself!
> > I did such surgery manually already, and it works quite good.
> >
> > Conclusion:
> >
> > After a disk shows up with bad sectors, you should indeed think of replacing
> > it as soon as possible, but it should not affect data integrity that much.
> > Instead it should be kept alive as long as possible until any necessary
> > recovery
> > took place.
> >
> > Dieter.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid and sleeping bad sectors

am 29.06.2004 22:56:58 von Dieter Stueken

Mike Tran wrote:
> (Please note that I don't mean to advertise EVMS here :) just want to
> mention that the functionality is available)
>
> EVMS, (http://evms.sourceforge.net) provides a solution to this "bad
> sectors" issue by having Bad Block Relocation (BBR) layer on the I/O
> stack.

Before proposing any solutions, i think it is very important to
distinguish carefully between different kinds of errors:

a) read errors: some alert bell should ring (syslog/mail..)
but the system should not careless disable any disk to avoid
making the problem even worse.

b) write errors: if some blocks are written partly, but can not
be written to all disks, it may help, to write the data
(may be temporary) somewhere else.

when we got a read error, due to an unreadable sector, we may
first try to rewrite it. In most cases, bad sector replacement
of the HD-firmware takes action and the problem is solved so far.

Only after this failed, we should turn over to plan b)

case b) may also help, if some disk gets temporary unavailable
(i.E. cabling problem). After manual intervention, that brings
the disk back on line again, the redirected data may even be
copied back.

Dieter
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 29.06.2004 23:51:32 von bugzilla

But!!!
Most disks, if not all, already re-map bad blocks on write. So, how will
EVMS help? The disk can't do anything about a read error! Only a redundant
system like md or a hardware raid can fix a read error.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mike Tran
Sent: Tuesday, June 29, 2004 2:43 PM
To: linux-raid
Subject: RE: raid and sleeping bad sectors

(Please note that I don't mean to advertise EVMS here :) just want to
mention that the functionality is available)

EVMS, (http://evms.sourceforge.net) provides a solution to this "bad
sectors" issue by having Bad Block Relocation (BBR) layer on the I/O
stack.

BBR enhances the reliability of a disk by remapping bad sectors. When
BBR is added to a disk, it reserves replacement sectors. BBR detects
failed WRITEs and remaps the I/O to the reserved sectors. BBR does not
remap failed READs. However, if anyone feels that it's necessary,
he/she can modify BBR code. My guess is about 10 lines of code. EVMS is
open source.

With EVMS, you can create MD arrays on top of BBR block devices. This
way RAID1/RAID5 will not see any I/O error on the "bad disk" as long as
BBR reserved sectors are available.

Obviously, the disadvantages of BBR are:

- Slow down I/O
- Disk space allocation for replacement sectors and mapping table


Regards,
Mike T.

On Tue, 2004-06-29 at 11:30, John Lange wrote:
> Ok, my two cents; Why would a disk ever be automatically removed from an
> array?
>
> If a disk has errors the array should write a log message and then write
> around the bad sectors. If you start to see a lot of error messages in
> your logs then the sysadmin should make the judgment to replace the
> drive.
>
> Even if the drive completely fails it should not be automatically
> removed from the array. md should continue to attempt to use it and keep
> logging errors (perhaps a useful statistic for md to track is a running
> percentage of disk accesses that resulted in an error?)
>
> If a sysadmin (or more likely a monitoring script) sees an excessive
> number of errors then it can take appropriate action. For example, if
> there is a spare, then remove the bad drive and active the spare. If
> there isn't a spare then keep the drive in the array but send out alerts
> for a sysadmin to intervene.
>
> I don't think automatically removing a drive does anything to protect
> data, in fact it jeopardizes it since if two disks have errors its game
> over. In essence it creates a very "fragile" array.
>
> I certainly am not an expert but that is my view as a sysadmin.
>
> Regards,
>
> John Lange
>
> On Tue, 2004-06-29 at 10:59, Guy wrote:
> > You are correct. 1 bad sector on a read, the disk is kicked out!
> > I agree with you, it (md) should not do that! Your #3 is something I
have
> > mentioned here a few times. I don't recall getting any comments!
> > I get 1 read error about every 3 months or so. I have 14 disks in a
RAID5
> > array. Every time I have been able to re-use the same disk. But it is
a
> > pain in the @$$! And I worry about a second bad sector!!!
> >
> > I do a read test of all of my disks each night! Hoping to catch an
error
> > before md does. Since a bad sector could go un-noticed for months! As
far
> > as I know, md does not test the disks any! As far as I know, md does
not
> > verify parity. As far as I know, md does not verify RAID1 data matches
on
> > all disks.
> >
> > You are also correct. 2 bad sectors on 2 different disks and "That's it
> > man! Game over man! Game over!". You may want to consider RAID6. It
will
> > allow 2 bad sectors, but not 3!! I have considered this myself. I have
14
> > disks with a spare. I should just go with a 15 disk RAID6.
> >
> > I disagree with your conclusion: It is normal for a disk to grow bad
> > sectors. 1 or 2 bad sectors is not an issue. Maybe 10 or 100 is an
issue.
> > I don't know what the limit should be. I have maybe 5-10 bad sectors on
my
> > 14 disks. In about 2 years I have not had a hard failure. I just
correct
> > the bad sector by hand and re-use the disk. Maybe I should track which
> > disks have had bad sectors to determine if there is a pattern, but I
don't.
> > I think md should to this. I have said so here in the past.
> >
> > Hardware RAID systems support bad sectors. Not sure they all do, but
some
> > or most do. EMC counts them and when some limit is reached the disk is
> > copied to a spare. The "bad" disk is kept on-line. After all, it is
still
> > working. I automatic service call is placed to have the "bad" disk
> > replaced. I have been told HP's XP-256 does not have a bad sector
limit.
> > They just wait until a disk fails! Because of this EMC replaces disks
more
> > often. Some see this a EMC has more failures. I don't see it this way.
> > They protect my data better. Getting off topic I think!....
> >
> > Guy
> >
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org
> > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Dieter Stueken
> > Sent: Tuesday, June 29, 2004 6:48 AM
> > To: linux-raid@vger.kernel.org
> > Subject: raid and sleeping bad sectors
> >
> > Question:
> >
> > Under which conditions a disk of a raid-5 system gets off line?
> > Does it happen on ANY error, even if some read error happened?
> > Will double-fault read errors on different disks destroy my
> > data?
> >
> > long story:
> >
> > I manage about 1TB of data on IDE disk and learned
> > a lot about different kinds of disk failures.
> > Fortunately I suffered no data loss so far, as I completely
> > mirror all data each night (kind of manual raid-1 :-)
> > I think about using raid-5 now.
> >
> > My observation was: a sudden total loss of a whole disk
> > was very unlikely. If you monitor the disk carefully using
> > its internal SMART capabilities, you are able to copy the
> > data and replace the disk long time before it finally dies.
> >
> > see: http://smartmontools.sourceforge.net/
> >
> > What happens frequently are spontaneous bad sectors, which
> > can not be read any more (i.e. CRC errors). Most people
> > think bad sectors are handled automatically by the firmware
> > of your HD. Unfortunately this is not the whole truth.
> > Instead of, a bad sector is indicated as bad, until it gets
> > explicitly rewritten by some new data. At this point, the
> > HD-firmware may decide to store the new data using a spare
> > sector instead. The bad news are: sectors turn to become
> > bad/unreadable quite spontaneously, even if they could be
> > read successfully short time before!
> >
> > You may ask, why this is a problem for a raid-5 system?
> > It is especially designed to handle such problems!
> > What makes me worry is, that those errors occur spontaneously
> > and without any notice possibly on several disks simultaneously.
> > You may detect such a problems only by a complete scan of
> > all sectors of your disk. The critical question is: what
> > happens, if the first bad sector on some disk get read.
> > Does this event kick off that disk from the system?
> > You may think its a good idea, to kick off the disk as
> > soon as possible. I think, this may be bad, as it dramatically
> > decreases the reliability of your remaining system, especially
> > if you have some other sleeping bad sector on any other disk, too.
> > At least when you try to rebuild your system, you run into
> > trouble.
> >
> > There are several possible solutions. (May be raid systems already
> > works this way, but I have no experience so far, and I could not
> > find too much about this in the FAQ or mailing-list)
> >
> > 1) I think a disk should be kept online as long as possible.
> > This means, that a simple read error should not deactivate the disk
> > as long the disk can be successfully written to and thus is still in
> > sync. As long, as "simple" read errors (even on different disks) occur,
> > my data is still reliable, as it is very unlikely, that two disk fail
> > with the SAME logical sector number. But it IS likely, that two disk
> > carry some sleeping bad sectors simultaneously.
> >
> > 2) If I decide to replace a disk, it should be possible to add a new
> > disk to the system before degrading it. After I successfully build the
> > new disk, I may switch off the bad one. This way I'm save against multi
> > disk read errors all time.
> >
> > example: array of the disks (A B C), want to replace B:
> >
> > 123456789 <- sector number
> > A aaaaaaaXa <- data on disk a, X = unreadable
> > B bbXbbbbbb <- disk b, will be replaced
> > C ccccXcccc
> >
> > B' bbbbbbbbb <- new spare disk for b build from current (A,B,C)
> >
> > 3) If a disks happened to produce a bad sector, you may try to rewrite
it
> > again, if you still have the data. Using Raid 2 or 5 this is possible,
as
> > long as you don't have a double fault on exactly the same sector on any
> > other disks. For a raid-1/5 system this means it might cure itself!
> > I did such surgery manually already, and it works quite good.
> >
> > Conclusion:
> >
> > After a disk shows up with bad sectors, you should indeed think of
replacing
> > it as soon as possible, but it should not affect data integrity that
much.
> > Instead it should be kept alive as long as possible until any necessary
> > recovery
> > took place.
> >
> > Dieter.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 30.06.2004 00:20:16 von Mike Tran

On Tue, 2004-06-29 at 16:51, Guy wrote:
> But!!!
> Most disks, if not all, already re-map bad blocks on write. So, how will
> EVMS help? The disk can't do anything about a read error! Only a redundant
> system like md or a hardware raid can fix a read error.
>
> Guy

Think of BBR as "addition" to hardware bad block relocation :)

Mike T.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid and sleeping bad sectors

am 30.06.2004 01:45:26 von Mike Tran

On Tue, 2004-06-29 at 15:56, Dieter Stueken wrote:
> Mike Tran wrote:
> > (Please note that I don't mean to advertise EVMS here :) just want to
> > mention that the functionality is available)
> >
> > EVMS, (http://evms.sourceforge.net) provides a solution to this "bad
> > sectors" issue by having Bad Block Relocation (BBR) layer on the I/O
> > stack.
>
> Before proposing any solutions, i think it is very important to
> distinguish carefully between different kinds of errors:
>
> a) read errors: some alert bell should ring (syslog/mail..)
> but the system should not careless disable any disk to avoid
> making the problem even worse.
>
> b) write errors: if some blocks are written partly, but can not
> be written to all disks, it may help, to write the data
> (may be temporary) somewhere else.
>
> when we got a read error, due to an unreadable sector, we may
> first try to rewrite it. In most cases, bad sector replacement
> of the HD-firmware takes action and the problem is solved so far.
>

For raid1 mirroring, I think the code for "rewrite" does not look too
bad. For raid5/raid5, it's going be harder. I'm not saying that it's
not doable :)

In fact, there is a cnt_corrected_read field in the MD ver 1
superblock. So, I hope this feature is coming soon.


> Only after this failed, we should turn over to plan b)
>
> case b) may also help, if some disk gets temporary unavailable
> (i.E. cabling problem). After manual intervention, that brings
> the disk back on line again, the redirected data may even be
> copied back.
>

Plan b) needs that "somewhere else." This can also be achieved with the
MD ver 1 superblock. Where we can reserve some sectors by correcly
setting the usable data_offset and data_size.

Now, we need user-space tool to create MD arrays with ver 1 superblock.
In addition, of course, we will also need to enhance MD kernel code.


Cheers,
Mike T.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 30.06.2004 04:19:32 von bugzilla

I don't think plan b needs to be handled as stated. If a cable is loose,
the amount of data that needs to be written somewhere else could be vast.
At least as big as 1 disk! Maybe just re-try the write. If the failure is
not a bad block, then let it die! Unless you want to allow the user to
define the amount of spare space. Create an array, but leave x% of the
space unused for temp data relocation. So, what do you do when the x% is
full? To me it seems too risky to attempt to track the re-located data.
After all you must be able to re-boot without loosing this data. Otherwise,
don't even attempt it. The "normal" systems administrator (operator) is
going to try a re-boot as the first step in correcting a problem!!! I am
not referring to the systems administrator that installed the system! I am
referring to the people that "operate" the system. In some cases they may
be the same person, luck you. In my environment we tend to deliver systems
to customers, they "operate" the systems.

If the hard drive can't re-locate the bad block, then, accept that you have
had a failure. But, maybe still attempt reads, the drive may come back to
life some day. But then you must track which blocks are good and which are
not. The not good blocks (stale) must be re-built, the good blocks (synced)
can still be read. This info also must be retained after a re-boot. Again,
too risky to me!

That brings me back to:
If the hard drive can't re-locate the bad block, then, accept that you have
had a failure.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mike Tran
Sent: Tuesday, June 29, 2004 7:45 PM
To: linux-raid
Subject: Re: raid and sleeping bad sectors

On Tue, 2004-06-29 at 15:56, Dieter Stueken wrote:
> Mike Tran wrote:
> > (Please note that I don't mean to advertise EVMS here :) just want to
> > mention that the functionality is available)
> >
> > EVMS, (http://evms.sourceforge.net) provides a solution to this "bad
> > sectors" issue by having Bad Block Relocation (BBR) layer on the I/O
> > stack.
>
> Before proposing any solutions, i think it is very important to
> distinguish carefully between different kinds of errors:
>
> a) read errors: some alert bell should ring (syslog/mail..)
> but the system should not careless disable any disk to avoid
> making the problem even worse.
>
> b) write errors: if some blocks are written partly, but can not
> be written to all disks, it may help, to write the data
> (may be temporary) somewhere else.
>
> when we got a read error, due to an unreadable sector, we may
> first try to rewrite it. In most cases, bad sector replacement
> of the HD-firmware takes action and the problem is solved so far.
>

For raid1 mirroring, I think the code for "rewrite" does not look too
bad. For raid5/raid5, it's going be harder. I'm not saying that it's
not doable :)

In fact, there is a cnt_corrected_read field in the MD ver 1
superblock. So, I hope this feature is coming soon.


> Only after this failed, we should turn over to plan b)
>
> case b) may also help, if some disk gets temporary unavailable
> (i.E. cabling problem). After manual intervention, that brings
> the disk back on line again, the redirected data may even be
> copied back.
>

Plan b) needs that "somewhere else." This can also be achieved with the
MD ver 1 superblock. Where we can reserve some sectors by correcly
setting the usable data_offset and data_size.

Now, we need user-space tool to create MD arrays with ver 1 superblock.
In addition, of course, we will also need to enhance MD kernel code.


Cheers,
Mike T.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid and sleeping bad sectors

am 30.06.2004 08:12:52 von Holger Kiehl

Hello

But what if the block device is not a disk? Remember md can consist of
any block device even mixture of them.

Holger
--

On Tue, 29 Jun 2004, Dieter Stueken wrote:

> Question:
>
> Under which conditions a disk of a raid-5 system gets off line?
> Does it happen on ANY error, even if some read error happened?
> Will double-fault read errors on different disks destroy my
> data?
>
> long story:
>
> I manage about 1TB of data on IDE disk and learned
> a lot about different kinds of disk failures.
> Fortunately I suffered no data loss so far, as I completely
> mirror all data each night (kind of manual raid-1 :-)
> I think about using raid-5 now.
>
> My observation was: a sudden total loss of a whole disk
> was very unlikely. If you monitor the disk carefully using
> its internal SMART capabilities, you are able to copy the
> data and replace the disk long time before it finally dies.
>
> see: http://smartmontools.sourceforge.net/
>
> What happens frequently are spontaneous bad sectors, which
> can not be read any more (i.e. CRC errors). Most people
> think bad sectors are handled automatically by the firmware
> of your HD. Unfortunately this is not the whole truth.
> Instead of, a bad sector is indicated as bad, until it gets
> explicitly rewritten by some new data. At this point, the
> HD-firmware may decide to store the new data using a spare
> sector instead. The bad news are: sectors turn to become
> bad/unreadable quite spontaneously, even if they could be
> read successfully short time before!
>
> You may ask, why this is a problem for a raid-5 system?
> It is especially designed to handle such problems!
> What makes me worry is, that those errors occur spontaneously
> and without any notice possibly on several disks simultaneously.
> You may detect such a problems only by a complete scan of
> all sectors of your disk. The critical question is: what
> happens, if the first bad sector on some disk get read.
> Does this event kick off that disk from the system?
> You may think its a good idea, to kick off the disk as
> soon as possible. I think, this may be bad, as it dramatically
> decreases the reliability of your remaining system, especially
> if you have some other sleeping bad sector on any other disk, too.
> At least when you try to rebuild your system, you run into
> trouble.
>
> There are several possible solutions. (May be raid systems already
> works this way, but I have no experience so far, and I could not
> find too much about this in the FAQ or mailing-list)
>
> 1) I think a disk should be kept online as long as possible.
> This means, that a simple read error should not deactivate the disk
> as long the disk can be successfully written to and thus is still in
> sync. As long, as "simple" read errors (even on different disks) occur,
> my data is still reliable, as it is very unlikely, that two disk fail
> with the SAME logical sector number. But it IS likely, that two disk
> carry some sleeping bad sectors simultaneously.
>
> 2) If I decide to replace a disk, it should be possible to add a new
> disk to the system before degrading it. After I successfully build the
> new disk, I may switch off the bad one. This way I'm save against multi
> disk read errors all time.
>
> example: array of the disks (A B C), want to replace B:
>
> 123456789 <- sector number
> A aaaaaaaXa <- data on disk a, X = unreadable
> B bbXbbbbbb <- disk b, will be replaced
> C ccccXcccc
>
> B' bbbbbbbbb <- new spare disk for b build from current (A,B,C)
>
> 3) If a disks happened to produce a bad sector, you may try to rewrite it
> again, if you still have the data. Using Raid 2 or 5 this is possible, as
> long as you don't have a double fault on exactly the same sector on any
> other disks. For a raid-1/5 system this means it might cure itself!
> I did such surgery manually already, and it works quite good.
>
> Conclusion:
>
> After a disk shows up with bad sectors, you should indeed think of replacing
> it as soon as possible, but it should not affect data integrity that much.
> Instead it should be kept alive as long as possible until any necessary
> recovery
> took place.
>
> Dieter.
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid and sleeping bad sectors

am 30.06.2004 10:44:44 von Dieter Stueken

Guy wrote:
> I don't think plan b needs to be handled as stated. If a cable is loose,
> the amount of data that needs to be written somewhere else could be vast.
> At least as big as 1 disk!

agreed; any such "active" bad block replacement will only postpone the
problem. It will never be able to solve the problem absolutely reliable.
I also agree, that it makes the system more complex and may need some
manual intervention, too.

> That brings me back to:
> If the hard drive can't re-locate the bad block, then, accept that you have
> had a failure.

completely true. Kick off the bad disk and rebuild a new one sound
easy and save. But it is not! Before/during rebuild, your raid system runs
unprotected in a very fragile state, even more fragile than a single
disk system without raid! Thus the additional spare blocks may help
to bridge over this phase more safely. Thus this is not an alternative
of replacing the bad disk; you will do that, definitely!
Instead it is just an addition.

If you are hanging on a rope on a cliff, and notice, that the rope
shows a defect, what will you do? Cut it off, before it breaks
completely! and while falling, you may inspect your backpack to
find a spare rope to use.....

Dieter.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 30.06.2004 23:40:55 von Mike Tran

Hello Guy,

I'm glad you did not oppose plan a) :)

Before ruling out some kind of bad block relocation, I still think there
are some scenarios which may be worth to consider.

In your environment, for example, assume you shipped a system configured
with 2-way 400GB mirror. Over time, both disks have had bad blocks and
the firmware can longer relocate bad blocks. The database application
writes 100MB table. Which one of the following 2 service calls you
would like receive?

1. "The database is corrupt. The 400GB raid1 volume is not operational."
-or-
2. "The email sent by MD monitor utility said "The raid1 array is
running in degraded mode and 50% of the reserved sectors have been
used. Please take appropriate actions." What should I do?"

Even the original author of Software RAID how-to made mistakes :) and
suggested that MD should have built-in bad block relocation, please
read http://linas.org/linux/peeves.html

Having bad block relocation can also be a big plus during reconstruction
of a degraded MD array (i.e. thanks to the fact that bad sectors on one
of the remaining disks had been remapped, the reconstruction completes
successfully!)

As for implementation of bad block relocation, you're right. Persistent
metadata (mapping table) is required. I see the risk you mentioned is
about the same as having other MD metadata (superblock) and
reconstruction of degraded arrays. Also, the disk contains the reserved
sectors could be a "small spare."

Just curious... How do you know the I/O failure is/isn't a bad block?
>From my knowledge, the only error is -EIO.


Regards,
Mike T.

On Tue, 2004-06-29 at 21:19, Guy wrote:
> I don't think plan b needs to be handled as stated. If a cable is loose,
> the amount of data that needs to be written somewhere else could be vast.
> At least as big as 1 disk! Maybe just re-try the write. If the failure is
> not a bad block, then let it die! Unless you want to allow the user to
> define the amount of spare space. Create an array, but leave x% of the
> space unused for temp data relocation. So, what do you do when the x% is
> full? To me it seems too risky to attempt to track the re-located data.
> After all you must be able to re-boot without loosing this data. Otherwise,
> don't even attempt it. The "normal" systems administrator (operator) is
> going to try a re-boot as the first step in correcting a problem!!! I am
> not referring to the systems administrator that installed the system! I am
> referring to the people that "operate" the system. In some cases they may
> be the same person, luck you. In my environment we tend to deliver systems
> to customers, they "operate" the systems.
>
> If the hard drive can't re-locate the bad block, then, accept that you have
> had a failure. But, maybe still attempt reads, the drive may come back to
> life some day. But then you must track which blocks are good and which are
> not. The not good blocks (stale) must be re-built, the good blocks (synced)
> can still be read. This info also must be retained after a re-boot. Again,
> too risky to me!
>
> That brings me back to:
> If the hard drive can't re-locate the bad block, then, accept that you have
> had a failure.
>
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mike Tran
> Sent: Tuesday, June 29, 2004 7:45 PM
> To: linux-raid
> Subject: Re: raid and sleeping bad sectors
>
> On Tue, 2004-06-29 at 15:56, Dieter Stueken wrote:
> > Mike Tran wrote:
> > > (Please note that I don't mean to advertise EVMS here :) just want to
> > > mention that the functionality is available)
> > >
> > > EVMS, (http://evms.sourceforge.net) provides a solution to this "bad
> > > sectors" issue by having Bad Block Relocation (BBR) layer on the I/O
> > > stack.
> >
> > Before proposing any solutions, i think it is very important to
> > distinguish carefully between different kinds of errors:
> >
> > a) read errors: some alert bell should ring (syslog/mail..)
> > but the system should not careless disable any disk to avoid
> > making the problem even worse.
> >
> > b) write errors: if some blocks are written partly, but can not
> > be written to all disks, it may help, to write the data
> > (may be temporary) somewhere else.
> >
> > when we got a read error, due to an unreadable sector, we may
> > first try to rewrite it. In most cases, bad sector replacement
> > of the HD-firmware takes action and the problem is solved so far.
> >
>
> For raid1 mirroring, I think the code for "rewrite" does not look too
> bad. For raid5/raid5, it's going be harder. I'm not saying that it's
> not doable :)
>
> In fact, there is a cnt_corrected_read field in the MD ver 1
> superblock. So, I hope this feature is coming soon.
>
>
> > Only after this failed, we should turn over to plan b)
> >
> > case b) may also help, if some disk gets temporary unavailable
> > (i.E. cabling problem). After manual intervention, that brings
> > the disk back on line again, the redirected data may even be
> > copied back.
> >
>
> Plan b) needs that "somewhere else." This can also be achieved with the
> MD ver 1 superblock. Where we can reserve some sectors by correcly
> setting the usable data_offset and data_size.
>
> Now, we need user-space tool to create MD arrays with ver 1 superblock.
> In addition, of course, we will also need to enhance MD kernel code.
>
>
> Cheers,
> Mike T.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 01.07.2004 00:44:16 von bugzilla

I have been wanting plan "a" for a long time now. It's not a new idea, but
it is a good one. I don't know if it was my idea first or not. Probably
not. :(

Your plan "b" is new to me. I still don't like it. :(
Maybe because there are other things I want more.
I want plan "a". I want the system to correct the bad block by re-writing
it! I want the system to count the number of times blocks have been
re-located by the drive. I want the system to send an alert when a limit
has been reached. This limit should be before the disk runs out of spare
blocks. I want the system to periodically verify all parity data and
mirrors. I want the system to periodically do a surface scan (would be a
side effect of verify parity).

I want to convert my RAID5 to RAID6. Humm, 2.4 kernel Doh!

By counting bad blocks, we should not reach the limit required by your plan
"b". But other problems could be saved by plan "b".

Not sure I said this... RAID6 with option "a" and bad block re-writes would
be able to survive a failed disk and 1 or more bad blocks on the other
disks. As long as each bad block is not on the same offset of a chunk.
This would make for a rock solid stable system. Add redundant power
supplies and UPSs. Nothing is 100%, but it would be much closer to 100%
than what we have now!

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mike Tran
Sent: Wednesday, June 30, 2004 5:41 PM
To: 'linux-raid'
Subject: RE: raid and sleeping bad sectors

Hello Guy,

I'm glad you did not oppose plan a) :)

Before ruling out some kind of bad block relocation, I still think there
are some scenarios which may be worth to consider.

In your environment, for example, assume you shipped a system configured
with 2-way 400GB mirror. Over time, both disks have had bad blocks and
the firmware can longer relocate bad blocks. The database application
writes 100MB table. Which one of the following 2 service calls you
would like receive?

1. "The database is corrupt. The 400GB raid1 volume is not operational."
-or-
2. "The email sent by MD monitor utility said "The raid1 array is
running in degraded mode and 50% of the reserved sectors have been
used. Please take appropriate actions." What should I do?"

Even the original author of Software RAID how-to made mistakes :) and
suggested that MD should have built-in bad block relocation, please
read http://linas.org/linux/peeves.html

Having bad block relocation can also be a big plus during reconstruction
of a degraded MD array (i.e. thanks to the fact that bad sectors on one
of the remaining disks had been remapped, the reconstruction completes
successfully!)

As for implementation of bad block relocation, you're right. Persistent
metadata (mapping table) is required. I see the risk you mentioned is
about the same as having other MD metadata (superblock) and
reconstruction of degraded arrays. Also, the disk contains the reserved
sectors could be a "small spare."

Just curious... How do you know the I/O failure is/isn't a bad block?
>From my knowledge, the only error is -EIO.


Regards,
Mike T.

On Tue, 2004-06-29 at 21:19, Guy wrote:
> I don't think plan b needs to be handled as stated. If a cable is loose,
> the amount of data that needs to be written somewhere else could be vast.
> At least as big as 1 disk! Maybe just re-try the write. If the failure
is
> not a bad block, then let it die! Unless you want to allow the user to
> define the amount of spare space. Create an array, but leave x% of the
> space unused for temp data relocation. So, what do you do when the x% is
> full? To me it seems too risky to attempt to track the re-located data.
> After all you must be able to re-boot without loosing this data.
Otherwise,
> don't even attempt it. The "normal" systems administrator (operator) is
> going to try a re-boot as the first step in correcting a problem!!! I am
> not referring to the systems administrator that installed the system! I
am
> referring to the people that "operate" the system. In some cases they may
> be the same person, luck you. In my environment we tend to deliver
systems
> to customers, they "operate" the systems.
>
> If the hard drive can't re-locate the bad block, then, accept that you
have
> had a failure. But, maybe still attempt reads, the drive may come back to
> life some day. But then you must track which blocks are good and which
are
> not. The not good blocks (stale) must be re-built, the good blocks
(synced)
> can still be read. This info also must be retained after a re-boot.
Again,
> too risky to me!
>
> That brings me back to:
> If the hard drive can't re-locate the bad block, then, accept that you
have
> had a failure.
>
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mike Tran
> Sent: Tuesday, June 29, 2004 7:45 PM
> To: linux-raid
> Subject: Re: raid and sleeping bad sectors
>
> On Tue, 2004-06-29 at 15:56, Dieter Stueken wrote:
> > Mike Tran wrote:
> > > (Please note that I don't mean to advertise EVMS here :) just want to
> > > mention that the functionality is available)
> > >
> > > EVMS, (http://evms.sourceforge.net) provides a solution to this "bad
> > > sectors" issue by having Bad Block Relocation (BBR) layer on the I/O
> > > stack.
> >
> > Before proposing any solutions, i think it is very important to
> > distinguish carefully between different kinds of errors:
> >
> > a) read errors: some alert bell should ring (syslog/mail..)
> > but the system should not careless disable any disk to avoid
> > making the problem even worse.
> >
> > b) write errors: if some blocks are written partly, but can not
> > be written to all disks, it may help, to write the data
> > (may be temporary) somewhere else.
> >
> > when we got a read error, due to an unreadable sector, we may
> > first try to rewrite it. In most cases, bad sector replacement
> > of the HD-firmware takes action and the problem is solved so far.
> >
>
> For raid1 mirroring, I think the code for "rewrite" does not look too
> bad. For raid5/raid5, it's going be harder. I'm not saying that it's
> not doable :)
>
> In fact, there is a cnt_corrected_read field in the MD ver 1
> superblock. So, I hope this feature is coming soon.
>
>
> > Only after this failed, we should turn over to plan b)
> >
> > case b) may also help, if some disk gets temporary unavailable
> > (i.E. cabling problem). After manual intervention, that brings
> > the disk back on line again, the redirected data may even be
> > copied back.
> >
>
> Plan b) needs that "somewhere else." This can also be achieved with the
> MD ver 1 superblock. Where we can reserve some sectors by correcly
> setting the usable data_offset and data_size.
>
> Now, we need user-space tool to create MD arrays with ver 1 superblock.
> In addition, of course, we will also need to enhance MD kernel code.
>
>
> Cheers,
> Mike T.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid and sleeping bad sectors

am 01.07.2004 01:27:19 von pegasus

On Wed, 30 Jun 2004 18:44:16 -0400
"Guy" wrote:

> I want plan "a". I want the system to correct the bad block by re-wr=
iting
> it! I want the system to count the number of times blocks have been
> re-located by the drive. I want the system to send an alert when a l=
imit
> has been reached. This limit should be before the disk runs out of s=
pare
> blocks. I want the system to periodically verify all parity data and
> mirrors. I want the system to periodically do a surface scan (would =
be a
> side effect of verify parity).

And where do you propose the system would store all the info about
badblocks?

I have an old hw raid controller for my alpha box maintains a badblock =
table
in its nvram. I guess it's a common feature in hw raid cards, since i h=
ad a
whole box of disks with firmwares that reported each internal badblock
relocation as scsi hardware error. Needless to say, linux sw raid freak=
ed
out on each such event. Things were very interesting untill we got firm=
ware
upgrade for those disks ...=20
Also, at least 3ware cards do a 'nightly maintenance' of disks which i =
guess
is something like dd if=3D/dev/hdX of=3D/dev/null ... What is holding y=
ou back
to do this with a simple shell script and a cron entry?
Now for cheching the parity in the raid5/6 setups, some kind of tool wo=
uld
be needed ... maybe some extension to mdadm? For the really paranoid pe=
ople
out there ... :)


--=20

Jure Pe=E8ar
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: raid and sleeping bad sectors

am 01.07.2004 03:52:50 von bugzilla

"And where do you propose the system would store all the info about
badblocks?"

Simple, this is an 8 or 16 bit value per device. I am sure we could fi=
nd 16
bits! If the device is replaced we don't need the info anymore, so sto=
re it
on the device! In the superblock maybe? Once the disk fails it would =
be
nice for md to log the current value, just so we know.

About the disk test. I do a disk test each night. That's my point!!! =
I
don't think I should do the test. If the test fails I need to correct =
it.
Let md test things, and correct them, and send an alert if it can't cor=
rect
it, or if a threshold is exceeded!

Paranoid? You been using computers long? I guess not. In time you wi=
ll
learn! :) If any block in the stripe gets hosed (parity or not) when =
you
replace a disk, during the re-build the constructed data will be wrong,=
even
if it was correct on the failed disk. The corruption now affects 2 dis=
ks.
Yes, I want to verify the parity. Can be just a utility that gives a
report. With RAID5 you can't determine which disk is corrupt! Only th=
at
the parity does not match the data. If the corruption was in the parit=
y,
re-writing the parity would correct it. If the corruption is in the da=
ta,
re-writing the parity will prevent spreading the corruption to another =
disk
during the next re-build. With RAID6 I think you could determine which=
disk
is corrupt and correct it!

Neil? Any thoughts? You have been silent on this subject.

Guy
------------------------------------------------------------ -----------=
--
Spock - "If I drop a wrench on a planet with a positive gravity field, =
I
need not see it fall, nor hear it hit the ground, to know that it has i=
n
fact fallen."

Guy - "Logic dictates that there are numerous events that could prevent=
the
wrench from hitting the ground. I need to verify the impact!"



-----Original Message-----
=46rom: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Jure Pe=E8ar
Sent: Wednesday, June 30, 2004 7:27 PM
To: linux-raid@vger.kernel.org
Subject: Re: raid and sleeping bad sectors

On Wed, 30 Jun 2004 18:44:16 -0400
"Guy" wrote:

> I want plan "a". I want the system to correct the bad block by re-wr=
iting
> it! I want the system to count the number of times blocks have been
> re-located by the drive. I want the system to send an alert when a l=
imit
> has been reached. This limit should be before the disk runs out of s=
pare
> blocks. I want the system to periodically verify all parity data and
> mirrors. I want the system to periodically do a surface scan (would =
be a
> side effect of verify parity).

And where do you propose the system would store all the info about
badblocks?

I have an old hw raid controller for my alpha box maintains a badblock =
table
in its nvram. I guess it's a common feature in hw raid cards, since i h=
ad a
whole box of disks with firmwares that reported each internal badblock
relocation as scsi hardware error. Needless to say, linux sw raid freak=
ed
out on each such event. Things were very interesting untill we got firm=
ware
upgrade for those disks ...=20
Also, at least 3ware cards do a 'nightly maintenance' of disks which i =
guess
is something like dd if=3D/dev/hdX of=3D/dev/null ... What is holding y=
ou back
to do this with a simple shell script and a cron entry?
Now for cheching the parity in the raid5/6 setups, some kind of tool wo=
uld
be needed ... maybe some extension to mdadm? For the really paranoid pe=
ople
out there ... :)


--=20

Jure Pear
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid and sleeping bad sectors

am 01.07.2004 04:42:22 von Michael Hardy

The last I heard Neil write on the subject (granted I've only been on=20
the list a couple weeks) was that it would just require an alteration i=
n=20
the code to do the auto-write-bad-block-on-read-error.

To me, that read as "submit a patch, if its good, I'll take it"=20
(apologies to Neil if I'm wrong - I'm definitely not qualified to speak=
=20
for him)

I haven't seen anyone disagree with this strategy of trying to force th=
e=20
hardware to remap when you have valid redundant data - and I've asked=20
several people I know off line.

The "plan b" of software-remapping write errors is probably more=20
contentious, but the "plan a" of read error remaps does seem easy and=20
non-controversial.

Its just that no one has stepped up with code.

I don't have the time to do it myself but I too run smartd and it does=20
long tests for me and occasionally reports errors before md finds them,=
=20
so I'd like the solution too. If anyone does code it up and submit an=20
accepted patch, I'd definitely ship a case of beer (or equivalent) thei=
r=20
direction...

-Mike

Guy wrote:
> "And where do you propose the system would store all the info about
> badblocks?"
>=20
> Simple, this is an 8 or 16 bit value per device. I am sure we could =
find 16
> bits! If the device is replaced we don't need the info anymore, so s=
tore it
> on the device! In the superblock maybe? Once the disk fails it woul=
d be
> nice for md to log the current value, just so we know.
>=20
> About the disk test. I do a disk test each night. That's my point!!=
! I
> don't think I should do the test. If the test fails I need to correc=
t it.
> Let md test things, and correct them, and send an alert if it can't c=
orrect
> it, or if a threshold is exceeded!
>=20
> Paranoid? You been using computers long? I guess not. In time you =
will
> learn! :) If any block in the stripe gets hosed (parity or not) whe=
n you
> replace a disk, during the re-build the constructed data will be wron=
g, even
> if it was correct on the failed disk. The corruption now affects 2 d=
isks.
> Yes, I want to verify the parity. Can be just a utility that gives a
> report. With RAID5 you can't determine which disk is corrupt! Only =
that
> the parity does not match the data. If the corruption was in the par=
ity,
> re-writing the parity would correct it. If the corruption is in the =
data,
> re-writing the parity will prevent spreading the corruption to anothe=
r disk
> during the next re-build. With RAID6 I think you could determine whi=
ch disk
> is corrupt and correct it!
>=20
> Neil? Any thoughts? You have been silent on this subject.
>=20
> Guy
> ------------------------------------------------------------ ---------=
----
> Spock - "If I drop a wrench on a planet with a positive gravity field=
, I
> need not see it fall, nor hear it hit the ground, to know that it has=
in
> fact fallen."
>=20
> Guy - "Logic dictates that there are numerous events that could preve=
nt the
> wrench from hitting the ground. I need to verify the impact!"
>=20
>=20
>=20
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Jure Pe=E8ar
> Sent: Wednesday, June 30, 2004 7:27 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: raid and sleeping bad sectors
>=20
> On Wed, 30 Jun 2004 18:44:16 -0400
> "Guy" wrote:
>=20
>=20
>>I want plan "a". I want the system to correct the bad block by re-wr=
iting
>>it! I want the system to count the number of times blocks have been
>>re-located by the drive. I want the system to send an alert when a l=
imit
>>has been reached. This limit should be before the disk runs out of s=
pare
>>blocks. I want the system to periodically verify all parity data and
>>mirrors. I want the system to periodically do a surface scan (would =
be a
>>side effect of verify parity).
>=20
>=20
> And where do you propose the system would store all the info about
> badblocks?
>=20
> I have an old hw raid controller for my alpha box maintains a badbloc=
k table
> in its nvram. I guess it's a common feature in hw raid cards, since i=
had a
> whole box of disks with firmwares that reported each internal badbloc=
k
> relocation as scsi hardware error. Needless to say, linux sw raid fre=
aked
> out on each such event. Things were very interesting untill we got fi=
rmware
> upgrade for those disks ...=20
> Also, at least 3ware cards do a 'nightly maintenance' of disks which =
i guess
> is something like dd if=3D/dev/hdX of=3D/dev/null ... What is holding=
you back
> to do this with a simple shell script and a cron entry?
> Now for cheching the parity in the raid5/6 setups, some kind of tool =
would
> be needed ... maybe some extension to mdadm? For the really paranoid =
people
> out there ... :)
>=20
>=20

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html