Recovery raid5 after sata cable failure.

Hi all,

I have raid5 array working without problem for some months. A SATA cable
failed and de raid5 works fine keeping the superblock persistent, but now, I
can't get the old device inserted into the array.
This is the array just now:

root [at] Torero-2:/mnt/raid5 # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Tue May 31 19:37:37 2005
Raid Level : raid5
Array Size : 1367507456 (1304.16 GiB 1400.33 GB)
Device Size : 195358208 (186.31 GiB 200.05 GB)
Raid Devices : 8
Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sun Jul 17 13:06:17 2005
State : clean, degraded
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

UUID : c4ed8e45:2a036953:92bff479:7cf5bac9
Events : 0.162797

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
4 8 65 4 active sync /dev/sde1
5 8 81 5 active sync /dev/sdf1
6 8 97 6 active sync /dev/sdg1
7 0 0 - removed


And I try to re-add the old disk in this way:

root [at] Torero-2:/mnt/raid5 # mdadm /dev/md0 -a /dev/sdh1
mdadm: Cannot open /dev/sdh1: Device or resource busy

What is wrong? What I am doing bad? Sdh1 is absolutely unused, so I
don't understand the error "resource busy"

Thanks,

Paco Zafra.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Francisco Zafra [ So, 17 Juli 2005 13:34 ] [ ID #882537 ]

RE: Recovery raid5 after sata cable failure.

Hi, I am using lvm in this system, but only for parallel ATA disks, that not
use raid system. Anyway I deactivated lvm units, and tried again to readd de
faulty drive with the same results:

root [at] Torero-2:~ # mdadm /dev/md0 -a /dev/sdh1
mdadm: Cannot open /dev/sdh1: Device or resource busy

Thanks,
Paco.

> -----Mensaje original-----
> De: Jim Radford [mailto:jradford [at] npl.com]
> Enviado el: domingo, 17 de julio de 2005 14:21
> Para: Francisco Zafra
> Asunto: Re: Recovery raid5 after sata cable failure.
>
> Francisco,
>
> I was having a simuliar issue, and it was due to the fact LVM
> was starting before the raid array was assembled, if you are
> using LVM you might want to check that the array is assembled
> before LVM starts. (Probably would be the same for EVMS also).
>
> Regards,
> Jim
>
>
> On Sun, 17 Jul 2005, Francisco Zafra wrote:
>
> > Hi all,
> >
> > I have raid5 array working without problem for some
> months. A SATA
> > cable failed and de raid5 works fine keeping the superblock
> > persistent, but now, I can't get the old device inserted
> into the array.
> > This is the array just now:
> >
> > root [at] Torero-2:/mnt/raid5 # mdadm --detail /dev/md0
> > /dev/md0:
> > Version : 00.90.01
> > Creation Time : Tue May 31 19:37:37 2005
> > Raid Level : raid5
> > Array Size : 1367507456 (1304.16 GiB 1400.33 GB)
> > Device Size : 195358208 (186.31 GiB 200.05 GB)
> > Raid Devices : 8
> > Total Devices : 7
> > Preferred Minor : 0
> > Persistence : Superblock is persistent
> >
> > Update Time : Sun Jul 17 13:06:17 2005
> > State : clean, degraded
> > Active Devices : 7
> > Working Devices : 7
> > Failed Devices : 0
> > Spare Devices : 0
> >
> > Layout : left-symmetric
> > Chunk Size : 512K
> >
> > UUID : c4ed8e45:2a036953:92bff479:7cf5bac9
> > Events : 0.162797
> >
> > Number Major Minor RaidDevice State
> > 0 8 1 0 active sync /dev/sda1
> > 1 8 17 1 active sync /dev/sdb1
> > 2 8 33 2 active sync /dev/sdc1
> > 3 8 49 3 active sync /dev/sdd1
> > 4 8 65 4 active sync /dev/sde1
> > 5 8 81 5 active sync /dev/sdf1
> > 6 8 97 6 active sync /dev/sdg1
> > 7 0 0 - removed
> >
> >
> > And I try to re-add the old disk in this way:
> >
> > root [at] Torero-2:/mnt/raid5 # mdadm /dev/md0 -a /dev/sdh1
> > mdadm: Cannot open /dev/sdh1: Device or resource busy
> >
> > What is wrong? What I am doing bad? Sdh1 is absolutely
> unused, so I
> > don't understand the error "resource busy"
> >
> > Thanks,
> >
> > Paco Zafra.
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> linux-raid"
> > in the body of a message to majordomo [at] vger.kernel.org More
> majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> ============================================================ ==
> ============
> Jim Radford <jradford [at] npl.com>
> http://www.jimradford.com/
> ============================================================ ==
> ============
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Francisco Zafra [ So, 17 Juli 2005 14:52 ] [ ID #882539 ]

Re: Recovery raid5 after sata cable failure.

On Sunday July 17, fzafra [at] gmail.com wrote:
> Hi all,
>
> I have raid5 array working without problem for some months. A SATA cable
> failed and de raid5 works fine keeping the superblock persistent, but now, I
> can't get the old device inserted into the array.
....
>
> And I try to re-add the old disk in this way:
>
> root [at] Torero-2:/mnt/raid5 # mdadm /dev/md0 -a /dev/sdh1
> mdadm: Cannot open /dev/sdh1: Device or resource busy
>
> What is wrong? What I am doing bad? Sdh1 is absolutely unused, so I
> don't understand the error "resource busy"

Well, it definitely is busy...

Maybe it is still part of md0, but marked as 'faulty'.
If so (cat /proc/mdstat would tell you) you need to remove it first.
mdadm /dev/md0 -r /dev/sdh1
mdadm /dev/md0 -a /dev/sdh1

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown [ Mo, 18 Juli 2005 00:10 ] [ ID #883283 ]

RE: Recovery raid5 after sata cable failure.

I already tried that:

root [at] Torero-2:~ # cat /proc/mdstat
Personalities : [linear] [raid5]
md0 : active raid5 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
1367507456 blocks level 5, 512k chunk, algorithm 2 [8/7] [UUUUUUU_]

unused devices: <none>
root [at] Torero-2:~ # mdadm /dev/md0 -r /dev/sdh1
mdadm: hot remove failed for /dev/sdh1: No such device or address
root [at] Torero-2:~ # mdadm /dev/md0 -a /dev/sdh1
mdadm: Cannot open /dev/sdh1: Device or resource busy
root [at] Torero-2:~ #

With no luck :(

Paco.

> -----Mensaje original-----
> De: Neil Brown [mailto:neilb [at] cse.unsw.edu.au]
> Enviado el: lunes, 18 de julio de 2005 0:11
> Para: Francisco Zafra
> CC: linux-raid [at] vger.kernel.org
> Asunto: Re: Recovery raid5 after sata cable failure.
>
> On Sunday July 17, fzafra [at] gmail.com wrote:
> > Hi all,
> >
> > I have raid5 array working without problem for some
> months. A SATA
> > cable failed and de raid5 works fine keeping the superblock
> > persistent, but now, I can't get the old device inserted
> into the array.
> ...
> >
> > And I try to re-add the old disk in this way:
> >
> > root [at] Torero-2:/mnt/raid5 # mdadm /dev/md0 -a /dev/sdh1
> > mdadm: Cannot open /dev/sdh1: Device or resource busy
> >
> > What is wrong? What I am doing bad? Sdh1 is absolutely
> unused, so I
> > don't understand the error "resource busy"
>
> Well, it definitely is busy...
>
> Maybe it is still part of md0, but marked as 'faulty'.
> If so (cat /proc/mdstat would tell you) you need to remove it first.
> mdadm /dev/md0 -r /dev/sdh1
> mdadm /dev/md0 -a /dev/sdh1
>
> NeilBrown
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Francisco Zafra [ Mo, 18 Juli 2005 10:36 ] [ ID #883303 ]

RE: Recovery raid5 after sata cable failure.

On Monday July 18, fzafra [at] gmail.com wrote:
> I already tried that:
>
> root [at] Torero-2:~ # cat /proc/mdstat
> Personalities : [linear] [raid5]
> md0 : active raid5 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
> 1367507456 blocks level 5, 512k chunk, algorithm 2 [8/7] [UUUUUUU_]
>
> unused devices: <none>
> root [at] Torero-2:~ # mdadm /dev/md0 -r /dev/sdh1
> mdadm: hot remove failed for /dev/sdh1: No such device or address
> root [at] Torero-2:~ # mdadm /dev/md0 -a /dev/sdh1
> mdadm: Cannot open /dev/sdh1: Device or resource busy


Uhm, you might have a buggy version of mdadm. If you have 1.10.0, get
an upgrade.

Otherwise either sdh1 or sdh must be:
open by some process with O_EXCL
open via a /dev/raw/* device
part of an md device (which it obviously isn't)
part of a dm device
mounted as a filesystem
an external-journal device for a jfs or ext3 or xfs filesystem
in use as a swap device
open for writing under a security level of 1 (whatever that means..)
an mtd device that is open

(those are all the places that I can find that take an exclusive lock
on a block device).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown [ Mo, 18 Juli 2005 11:40 ] [ ID #883304 ]

RE: Recovery raid5 after sata cable failure.

Hi Neil,
Since some hours I am trying to solved it with the last version:
root [at] Torero-2:~ # mdadm --version
mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005

With the same results :(

I really don't think it is locked I dd it in act of desperation and I have
no problems:
root [at] Torero-2:~ # dd if=/dev/zero of=/dev/sdh bs=1k count=1000
1000+0 records in
1000+0 records out
1024000 bytes transferred in 0.417862 seconds (2450570 bytes/sec)

No locked or anything... I have really get out of ideas with this...

Thanks for all your help.

Paco.


> -----Mensaje original-----
> De: Neil Brown [mailto:neilb [at] cse.unsw.edu.au]
> Enviado el: lunes, 18 de julio de 2005 11:41
> Para: Francisco Zafra
> CC: linux-raid [at] vger.kernel.org
> Asunto: RE: Recovery raid5 after sata cable failure.
>
> On Monday July 18, fzafra [at] gmail.com wrote:
> > I already tried that:
> >
> > root [at] Torero-2:~ # cat /proc/mdstat
> > Personalities : [linear] [raid5]
> > md0 : active raid5 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2]
> sdb1[1] sda1[0]
> > 1367507456 blocks level 5, 512k chunk, algorithm 2 [8/7]
> > [UUUUUUU_]
> >
> > unused devices: <none>
> > root [at] Torero-2:~ # mdadm /dev/md0 -r /dev/sdh1
> > mdadm: hot remove failed for /dev/sdh1: No such device or address
> > root [at] Torero-2:~ # mdadm /dev/md0 -a /dev/sdh1
> > mdadm: Cannot open /dev/sdh1: Device or resource busy
>
>
> Uhm, you might have a buggy version of mdadm. If you have
> 1.10.0, get an upgrade.
>
> Otherwise either sdh1 or sdh must be:
> open by some process with O_EXCL
> open via a /dev/raw/* device
> part of an md device (which it obviously isn't)
> part of a dm device
> mounted as a filesystem
> an external-journal device for a jfs or ext3 or xfs filesystem
> in use as a swap device
> open for writing under a security level of 1 (whatever that means..)
> an mtd device that is open
>
> (those are all the places that I can find that take an
> exclusive lock on a block device).
>
> NeilBrown
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Francisco Zafra [ Mo, 18 Juli 2005 11:56 ] [ ID #883305 ]

Re: Recovery raid5 after sata cable failure.

Francisco Zafra wrote:
> Hi Neil,
> Since some hours I am trying to solved it with the last version:
> root [at] Torero-2:~ # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
>
> With the same results :(
>
> I really don't think it is locked I dd it in act of desperation and I have
> no problems:
> root [at] Torero-2:~ # dd if=/dev/zero of=/dev/sdh bs=1k count=1000
> 1000+0 records in
> 1000+0 records out
> 1024000 bytes transferred in 0.417862 seconds (2450570 bytes/sec)
>

Asking a silly question perhaps..

fuser /dev/sdh

Regards,
Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Brad Campbell [ Mo, 18 Juli 2005 12:15 ] [ ID #883306 ]

safe to test SATA array by pulling cables?

I've read conflicting views on whether it's safe to pull either or both the
SATA data cable or power cable from a disk in an array (when they are NOT in
a hotswap cage) to test whether things works as expected during a real life
disk failure.

Is there a consensus on this? Theoretically either should be handled by the
controller (a 3ware 9500s) since this is a real life possibility, but will
either event cause damage to the disk? I'd suspect that removal of the data
cable would simulate a disk loss with less physical trauma to the disk in
question.


--
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm [at] tacgi.com
<<plain text preferred>>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Harry Mangalam [ Mi, 12 Oktober 2005 19:19 ] [ ID #1007283 ]
Linux » gmane.linux.raid » Recovery raid5 after sata cable failure.

Vorheriges Thema: [PATCH md 000 of 5] Introduction
Nächstes Thema: big fault :-)