Mdadm re-add fails

Hi!

I have a 2 disk raid1 data array. As a result of other testing, the dev=
ice info
in the superblock for one of the partners, /dev/sdc2, ended up being in=
slot 3
of the device info array:

[root [at] typhon ~]# mdadm --detail /dev/md21
/dev/md21:
=A0=A0Version : 1.2
=A0 Creation Time : Mon May=A0 9 11:19:43 2011
=A0=A0Raid Level : raid1
=A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
=A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
=A0 Raid Devices : 2
=A0 Total Devices : 2
=A0 Persistence : Superblock is persistent

=A0 Intent Bitmap : Internal

=A0 Update Time : Thu May 12 15:51:50 2011
=A0 State : active
=A0 Active Devices : 2
Working Devices : 2
=A0Failed Devices : 0
=A0 Spare Devices : 0

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Name : typhon.mno.stratus.com:21=A0 (loc=
al to host typhon.mno.stratus.com)
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 UUID : 996d993f:baac367a:8b154ba9:43e56c=
ff
=A0=A0=A0=A0=A0=A0=A0=A0 Events : 687

=A0=A0=A0 Number=A0=A0 Major=A0=A0 Minor=A0=A0 RaidDevice State
-->=A0=A0=A0 3=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 34=A0=A0=A0=A0=A0=A0=
=A0 0=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdc2
=A0=A0=A0=A0=A0=A0 2=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 82=A0=A0=A0=A0=
=A0=A0=A0 1=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdk2

When I remove /dev/sdk2 and then a re-add it back in, the re-add fails:

>> [root [at] typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> [root [at] typhon ~]# mdadm /dev/md21 -a /dev/sdk2
mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a --=
re-add
fails.
mdadm: not performing --add as that would convert /dev/sdk2 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" fi=
rst.

I believe the re-add fails because the enough_fd function (util.c) is n=
ot searching deep enough into the
dev_info array with this line of code:
=A0=A0 for (i=3D0; i<array.raid_disks + array.nr_disks; i++)

array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this partic=
ular md device, it is only looking at slots 0-2.=A0
I believe the code needs to be changed to look at all possible dev_info=
array slots, taking into account the
version of the superblock (like the Detail function does (Detail.c).=A0=


Do folks agree?

Thanks & regards,
Annemarie

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Annemarie.Schmidt [ Mi, 18 Mai 2011 16:43 ] [ ID #2059617 ]

Re: Mdadm re-add fails

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
<Annemarie.Schmidt [at] stratus.com> wrote:

> Hi!
>
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:
>
> [root [at] typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
> =A0=A0Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
> =A0=A0Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>
> =A0 Intent Bitmap : Internal
>
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
> =A0=A0=A0=A0=A0=A0=A0=A0 Events : 687
>
> =A0=A0=A0 Number=A0=A0 Major=A0=A0 Minor=A0=A0 RaidDevice State
> -->=A0=A0=A0 3=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 34=A0=A0=A0=A0=A0=A0=
=A0 0=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdc2
> =A0=A0=A0=A0=A0=A0 2=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 82=A0=A0=A0=
=A0=A0=A0=A0 1=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdk2
>
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>
> >> [root [at] typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>
> >> [root [at] typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
> =A0=A0 for (i=3D0; i<array.raid_disks + array.nr_disks; i++)
>
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the
> version of the superblock (like the Detail function does (Detail.c).=A0=

>
> Do folks agree?
>

I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
[at] [at] -370,10 +370,14 [at] [at] int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i<array.raid_disks + array.nr_disks; i++) {
+ for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major =3D=3D 0 && disk.minor =3D=3D 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1<<MD_DISK_SYNC)))
continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
NeilBrown [ Fr, 20 Mai 2011 01:51 ] [ ID #2059776 ]

RE: Mdadm re-add fails

Neil,

Yes, that worked:

>> [root [at] typhon ~]# mdadm --detail /dev/md24
/dev/md24:
Version : 1.2
Creation Time : Fri May 20 11:42:17 2011
Raid Level : raid1
Array Size : 5241844 (5.00 GiB 5.37 GB)
Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Fri May 20 12:47:09 2011
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : typhon.mno.stratus.com:24 (local to host typhon.mno.=
stratus.com)
UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
Events : 155

Number Major Minor RaidDevice State
3 65 22 0 active sync /dev/sdc6
2 65 54 1 active sync /dev/sdk6

>> [root [at] typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24

Without the fix:
---------------------
>> root [at] typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" fi=
rst.

With the fix:
-----------------
>> [root [at] typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6 =

mdadm: re-added /dev/sdk6

Thanks very much for the assistance.

Regards,
Annemarie


-----Original Message-----
=46rom: NeilBrown [mailto:neilb [at] suse.de]
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid [at] vger.kernel.org
Subject: Re: Mdadm re-add fails

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
<Annemarie.Schmidt [at] stratus.com> wrote:

> Hi!
>
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:
>
> [root [at] typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
> =A0=A0Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
> =A0=A0Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>
> =A0 Intent Bitmap : Internal
>
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
> =A0=A0=A0=A0=A0=A0=A0=A0 Events : 687
>
> =A0=A0=A0 Number=A0=A0 Major=A0=A0 Minor=A0=A0 RaidDevice State
> -->=A0=A0=A0 3=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 34=A0=A0=A0=A0=A0=A0=
=A0 0=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdc2
> =A0=A0=A0=A0=A0=A0 2=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 82=A0=A0=A0=
=A0=A0=A0=A0 1=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdk2
>
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>
> >> [root [at] typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>
> >> [root [at] typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
> =A0=A0 for (i=3D0; i<array.raid_disks + array.nr_disks; i++)
>
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the
> version of the superblock (like the Detail function does (Detail.c).=A0=

>
> Do folks agree?
>

I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
[at] [at] -370,10 +370,14 [at] [at] int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i<array.raid_disks + array.nr_disks; i++) {
+ for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major =3D=3D 0 && disk.minor =3D=3D 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1<<MD_DISK_SYNC)))
continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Annemarie.Schmidt [ Fr, 20 Mai 2011 19:16 ] [ ID #2059804 ]

RE: Mdadm re-add fails

Hi Neil,

I've unfortunately run into a problem with the patch to the enough_fd c=
ode. It does not appear to work in all cases.

mdadm --detail /dev/md21
Number Major Minor RaidDevice State
3 65 18 0 active sync /dev/sdc2
2 65 50 1 active sync /dev/sdk2


Here it works when I remove /dev/sdk2

>> mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> mdadm /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2

But when I try to remove the other disk, /dev/sdc2, it doesn't:

>> mdadm /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21

>> mdadm /dev/md21 -a /dev/sdc2
mdadm: /dev/sdc2 reports being an active member for /dev/md21, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdc2 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdc2" fi=
rst.


I could get it all to work when I removed this line from the :

+ array.raid_disks--;

>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2


>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21

>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdc2
mdadm: re-added /dev/sdc2

So can this line simply be removed or does the patch need to be reworke=
d?

Thanks & regards,
Annemarie Schmidt


-----Original Message-----
=46rom: Schmidt, Annemarie
Sent: Friday, May 20, 2011 1:16 PM
To: 'NeilBrown'
Cc: linux-raid [at] vger.kernel.org; Dailey, Nate
Subject: RE: Mdadm re-add fails

Neil,

Yes, that worked:

>> [root [at] typhon ~]# mdadm --detail /dev/md24
/dev/md24:
Version : 1.2
Creation Time : Fri May 20 11:42:17 2011
Raid Level : raid1
Array Size : 5241844 (5.00 GiB 5.37 GB)
Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Fri May 20 12:47:09 2011
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : typhon.mno.stratus.com:24 (local to host typhon.mno.=
stratus.com)
UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
Events : 155

Number Major Minor RaidDevice State
3 65 22 0 active sync /dev/sdc6
2 65 54 1 active sync /dev/sdk6

>> [root [at] typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24

Without the fix:
---------------------
>> root [at] typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" fi=
rst.

With the fix:
-----------------
>> [root [at] typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6 =

mdadm: re-added /dev/sdk6

Thanks very much for the assistance.

Regards,
Annemarie


-----Original Message-----
=46rom: NeilBrown [mailto:neilb [at] suse.de]
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid [at] vger.kernel.org
Subject: Re: Mdadm re-add fails

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
<Annemarie.Schmidt [at] stratus.com> wrote:

> Hi!
>
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:
>
> [root [at] typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
> =A0=A0Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
> =A0=A0Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>
> =A0 Intent Bitmap : Internal
>
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
> =A0=A0=A0=A0=A0=A0=A0=A0 Events : 687
>
> =A0=A0=A0 Number=A0=A0 Major=A0=A0 Minor=A0=A0 RaidDevice State
> -->=A0=A0=A0 3=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 34=A0=A0=A0=A0=A0=A0=
=A0 0=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdc2
> =A0=A0=A0=A0=A0=A0 2=A0=A0=A0=A0=A0 65=A0=A0=A0=A0=A0=A0 82=A0=A0=A0=
=A0=A0=A0=A0 1=A0=A0=A0=A0=A0 active sync=A0=A0 /dev/sdk2
>
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>
> >> [root [at] typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>
> >> [root [at] typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
> =A0=A0 for (i=3D0; i<array.raid_disks + array.nr_disks; i++)
>
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the
> version of the superblock (like the Detail function does (Detail.c).=A0=

>
> Do folks agree?
>

I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
[at] [at] -370,10 +370,14 [at] [at] int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i<array.raid_disks + array.nr_disks; i++) {
+ for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major =3D=3D 0 && disk.minor =3D=3D 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1<<MD_DISK_SYNC)))
continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Annemarie.Schmidt [ Fr, 27 Mai 2011 23:16 ] [ ID #2060132 ]
Linux » gmane.linux.raid » Mdadm re-add fails

Vorheriges Thema: Upgrading from metadata 0.9
Nächstes Thema: Can not start md0 after upgrade.