Linux 2.6.10 / RAID1 problem

Linux 2.6.10 / RAID1 problem

am 05.01.2005 21:43:08 von Sven Anders

This is a multi-part message in MIME format.
--------------050202090305090002060309
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello!

I'm expirencing strange problems on my server running Software RAID1 (Mirroring) under
the Linux kernel 2.6.10. It works nicely under Linux 2.4.18!

Configuration:
~ Athlon 1.5GHz, 256MB RAM (tested!)
~ 2x160GB Harddisks (same type) on different IDE controllers
~ (ext3 fs with journalling turned on)
~ Swap is turned off
~ Linux Kernel 2.6.10 (vanilla) without SMP, Preemption turned off

Test case:
~ dd if=/dev/zero of=test0 bs=1M count=300
~ while :; do cp test0 test1; cp test1 test2; cp test2 test0; od test0; done

Error:
~ On Linux 2.4.18 it worked serveral hours perfectly (until I stopped it...)
~ On Linux 2.6.10 after some minutes the following (or similar) error occures:


EXT3-fs error (device md5): ext3_free_blocks_sb: bit already cleared for block 1303980
Aborting journal on device md5.
ext3_abort called.
EXT3-fs error (device md5): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5): ext3_free_blocks_sb: bit already cleared for block 1303981
ext3_free_blocks_sb: aborting transaction: Journal has aborted in __ext3_journal_get_undo_access<2>EXT3-fs error (devi
ce md5) in ext3_free_blocks_sb: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error
~ (device md5) in ext3_reserve_inode_write: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error
~ (device md5) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device md5) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device md5) in ext3_truncate: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted
EXT3-fs error (device md5) in start_transaction: Journal has aborted


After this I have to do a filesystem check and the 2.4.18 kernel reports a dirty RAID and
starts the resync...


Does anybody have an idea?
What's the cause of this???

Is there an known bug in the 2.6.10?
Is the Software RAID in the 2.6 series stable?

Any special kernel compile options not to use when using RAID?
Any other (better) test to do?

Regards
~ Sven

- --
~ Sven Anders

~ ANDURAS service solutions AG
~ Innstraße 71 - 94036 Passau - Germany
~ Web: www.anduras.de - Tel: +49 (0)851-4 90 50-0 - Fax: +49 (0)851-4 90 50-55

Rechtsform: Aktiengesellschaft - Sitz: Passau - Amtsgericht Passau HRB 6032
Mitglieder des Vorstands: Sven Anders, Marcus Junker, Michael Schön
Vorsitzender des Aufsichtsrats: Dipl. Kfm. Karlheinz Antesberger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFB3FFc5lKZ7Feg4EcRAoidAJ0W3HLPwNt5oni8zccpquO9jSF/qwCc CQz+
tStmVeAocqqm/G5TzAtDe84=
=urMZ
-----END PGP SIGNATURE-----

--------------050202090305090002060309
Content-Type: text/x-vcard; charset=utf8;
name="anders.vcf"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="anders.vcf"

begin:vcard
fn:Sven Anders
n:Anders;Sven
org:ANDURAS AG;Research and Development
adr;quoted-printable:;;Innstraße 71;Passau;Bavaria;94036;Germany
email;internet:anders@anduras.de
title:Dipl. Inf.
tel;work:++49 (0)851 / 490 50 - 0
tel;fax:+49 (0)851 / 4 90 50 - 55
x-mozilla-html:FALSE
url:http://www.anduras.de
version:2.1
end:vcard


--------------050202090305090002060309--
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Linux 2.6.10 / RAID1 problem

am 05.01.2005 23:35:40 von Neil Brown

On Wednesday January 5, anders@anduras.de wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello!
>
> I'm expirencing strange problems on my server running Software RAID1 (Mirroring) under
> the Linux kernel 2.6.10. It works nicely under Linux 2.4.18!

Why do you think it is a raid problem rather than an ext3 problem?

.....
> Test case:
> ~ dd if=/dev/zero of=test0 bs=1M count=300
> ~ while :; do cp test0 test1; cp test1 test2; cp test2 test0; od test0; done
>
> Error:
> ~ On Linux 2.4.18 it worked serveral hours perfectly (until I stopped it...)
> ~ On Linux 2.6.10 after some minutes the following (or similar) error occures:
>
>
> EXT3-fs error (device md5): ext3_free_blocks_sb: bit already cleared for block 1303980
> Aborting journal on device md5.
> ext3_abort called.
......
>
>
> Does anybody have an idea?
> What's the cause of this???

No (not me at least).
Are you able to run the same test on a single unraided drive?

>
> Is there an known bug in the 2.6.10?
> Is the Software RAID in the 2.6 series stable?

I am not aware of any significant problems with Software RAID in
2.6.10.

>
> Any special kernel compile options not to use when using RAID?

No.

> Any other (better) test to do?

As above, try without raid.
If the problem goes away, come back here and report that fact.
If it doesn't report it to the ext3 mailing list, or to linux-kernel.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Linux 2.6.10 / RAID1 problem

am 05.01.2005 23:46:56 von bernd

Hi Sven,

we used RAID1 under Kernel 2.4.x without any problems (except the count=
ers
for activ- and spare disks became wrong). With kernel 2.6.x RAID1 is a=20
desaster. This problem was dicussed in this newsgroup in the second hal=
f
of last year. The kernel dies if a disk fails or if the system is heavy=
=20
loaded (the latter one we can't reproduce but it happens, may be there
will be some other issues on the SCSI bus for other reasons leading to
the same result).=20

If one disk fails or if there is a bus reset on one of the two SCSI
controlles we are using for RAID1 the system crashes. It doesn't go
on with the remaining disk on the other controller which is expected
for RAID1 (and what is the reason for using RAID1 at all!). In the=20
logs we see many failed superblock writes and other strange things
just before the kernel gives up and dies. From another company we=20
heard about data corruptions if the above takes place. We didn't see=20
this, may be we are using Reiserfs which they don't.=20

I asked about the state of this problem some days ago in a new thread=20
in this newsgroup but there was no response. I wonder nobody has simila=
r
problems out there.

Another hint. You can't reproduce this using mdadm -f to set a disk
faulty. This will work fine because it enters the kernel in a=20
different way as a 'real' disk failure will do.

So in summary I can answer your question about stability of RAID1
under 2.6.x with no, even if our problems do not correspond exactely
to yours!!!

Greetings Bernd Rieke
R&H Computer Systems
Tel +49 (0)89 750078

------------------------------------------------------------ -----------=
--

>Hello!
>
>I'm expirencing strange problems on my server running Software RAID1 (=
Mirroring) under
>the Linux kernel 2.6.10. It works nicely under Linux 2.4.18!
>
>Configuration:
>~ Athlon 1.5GHz, 256MB RAM (tested!)
>~ 2x160GB Harddisks (same type) on different IDE controllers
>~ (ext3 fs with journalling turned on)
>~ Swap is turned off
>~ Linux Kernel 2.6.10 (vanilla) without SMP, Preemption turned off
>
>Test case:
>~ dd if=3D/dev/zero of=3Dtest0 bs=3D1M count=3D300
>~ while :; do cp test0 test1; cp test1 test2; cp test2 test0; od test0=
; done
>
>Error:
>~ On Linux 2.4.18 it worked serveral hours perfectly (until I stopped =
it...)
>~ On Linux 2.6.10 after some minutes the following (or similar) error =
occures:
>
>EXT3-fs error (device md5): ext3_free_blocks_sb: bit already cleared f=
or block 1303980
>Aborting journal on device md5.
>ext3_abort called.
>EXT3-fs error (device md5): ext3_journal_start_sb: Detected aborted jo=
urnal
>Remounting filesystem read-only
>EXT3-fs error (device md5) in start_transaction: Journal has aborted
... snip ....
... snip ....
>EXT3-fs error (device md5) in start_transaction: Journal has aborted
>
>After this I have to do a filesystem check and the 2.4.18 kernel repor=
ts a dirty RAID and
>starts the resync...
>
>Does anybody have an idea?
>What's the cause of this???
>
>Is there an known bug in the 2.6.10?
>Is the Software RAID in the 2.6 series stable?
>
>Any special kernel compile options not to use when using RAID?
>Any other (better) test to do?
>
>Regards
>~ Sven
>
>~ Sven Anders
>
>~ ANDURAS service solutions AG
>~ Innstraße 71 - 94036 Passau - Germany
>~ Web: www.anduras.de - Tel: +49 (0)851-4 90 50-0 - Fax: +49 (0)851-4 =
90 50-55
>
>Rechtsform: Aktiengesellschaft - Sitz: Passau - Amtsgericht Passau HRB=
6032
>Mitglieder des Vorstands: Sven Anders, Marcus Junker, Michael Schön
>Vorsitzender des Aufsichtsrats: Dipl. Kfm. Karlheinz Antesberger
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Linux 2.6.10 / RAID1 problem

am 05.01.2005 23:55:06 von Neil Brown

On Wednesday January 5, bernd@rhm.de wrote:
> Hi Sven,
>
> we used RAID1 under Kernel 2.4.x without any problems (except the counters
> for activ- and spare disks became wrong). With kernel 2.6.x RAID1 is a
> desaster. This problem was dicussed in this newsgroup in the second half
> of last year. The kernel dies if a disk fails
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This was true in 2.6.9 and earlier. It is fixed in 2.6.10.

> or if the system is heavy
^^^^^^^^^^^^^^^^^^^^^^^^^
> loaded (the latter one we can't reproduce but it happens, may be there
^^^^^^
> will be some other issues on the SCSI bus for other reasons leading to
> the same result).

This one I am not aware of. More details would be most helpful.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Linux 2.6.10 / RAID1 problem

am 06.01.2005 13:01:39 von Sven Anders

This is a multi-part message in MIME format.
--------------070104000803010102040203
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Neil Brown wrote:
| On Wednesday January 5, anders@anduras.de wrote:
|
|>-----BEGIN PGP SIGNED MESSAGE-----
|>Hash: SHA1
|>
|>Hello!
|>
|>I'm expirencing strange problems on my server running Software RAID1 (Mirroring) under
|>the Linux kernel 2.6.10. It works nicely under Linux 2.4.18!

(Correction: not 2.4.18, it's a 2.4.25 kernel!)

|
|
| Why do you think it is a raid problem rather than an ext3 problem?
|
| Are you able to run the same test on a single unraided drive?

Yes I've done it today and running on the same harddisk (one disk alone and both
in parallel) everything worked fine!

Then I created a RAID1 of this disks and tried the test again... Same error!

Maybe it's an problem with ext3 running on RAID1, but it's worked under 2.4.25!

|>Any special kernel compile options not to use when using RAID?
|
| No.

One difference between the kernels is, the 2.6.10 is optimized for the current
processor (Athlon) and the old 2.4.25 is optimized for AMD-K6/2 (the old processor).


What do you think??

Regards
~ Sven

- --
~ Sven Anders

~ ANDURAS service solutions AG
~ Innstraße 71 - 94036 Passau - Germany
~ Web: www.anduras.de - Tel: +49 (0)851-4 90 50-0 - Fax: +49 (0)851-4 90 50-55

Rechtsform: Aktiengesellschaft - Sitz: Passau - Amtsgericht Passau HRB 6032
Mitglieder des Vorstands: Sven Anders, Marcus Junker, Michael Schön
Vorsitzender des Aufsichtsrats: Dipl. Kfm. Karlheinz Antesberger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFB3Sij5lKZ7Feg4EcRAu3XAJ40M9WdVCZDdmFTuZny+vsrZKMD5ACg jskA
bltLWcobu2pliqojMHg4ZQQ=
=CJJC
-----END PGP SIGNATURE-----

--------------070104000803010102040203
Content-Type: text/x-vcard; charset=utf8;
name="anders.vcf"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="anders.vcf"

begin:vcard
fn:Sven Anders
n:Anders;Sven
org:ANDURAS AG;Research and Development
adr;quoted-printable:;;Innstraße 71;Passau;Bavaria;94036;Germany
email;internet:anders@anduras.de
title:Dipl. Inf.
tel;work:++49 (0)851 / 490 50 - 0
tel;fax:+49 (0)851 / 4 90 50 - 55
x-mozilla-html:FALSE
url:http://www.anduras.de
version:2.1
end:vcard


--------------070104000803010102040203--
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Linux 2.6.10 / RAID1 problem (SOLVED)

am 14.02.2005 21:02:04 von Sven Anders

This is a multi-part message in MIME format.
--------------080107040903030502010403
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sven Anders wrote:
| Neil Brown wrote:
| | On Wednesday January 5, anders@anduras.de wrote:
| |>
| |>I'm expirencing strange problems on my server running Software RAID1 (Mirroring) under
| |>the Linux kernel 2.6.10. It works nicely under Linux 2.4.25!
| |
| | Why do you think it is a raid problem rather than an ext3 problem?
| |
| | Are you able to run the same test on a single unraided drive?
|
| Yes I've done it today and running on the same harddisk (one disk alone and both
| in parallel) everything worked fine!
|
| Then I created a RAID1 of this disks and tried the test again... Same
| error!
|
| Maybe it's an problem with ext3 running on RAID1, but it's worked under
| 2.4.25!

Hi everybody!

I finally solved my problem. After some more test, I was finally able to reproduce the
problem under Linux 2.4 too. My Test had to run over 10 hours and many gigabytes of copied
data until the same disk corruption occured. It seemed that the Linux kernel 2.6 triggered
the problem faster, because it did more stress on the hardware.

After that, I tried RAID on this harddisks on an external IDE controller with a different
chipset with no success either. Finally I replaced my cheap mainboard (AsRock K7VT4A+) by
an expensive MSI mainboard and everthing works fine.


As somebody else on this list already stated:

~ NEVER buy cheap hardware, if you want to run RAID on it!


Linux RAID may have some bugs, but it seems to be stable, after all!
Never trust cheap hardware :-)


Thanks for all the help,
~ signing off...

~ Sven

- --
~ Sven Anders () Ascii Ribbon Campaign
~ /\ Support plain text e-mail
~ ANDURAS service solutions AG
~ Innstraße 71 - 94036 Passau - Germany
~ Web: www.anduras.de - Tel: +49 (0)851-4 90 50-0 - Fax: +49 (0)851-4 90 50-55

Rechtsform: Aktiengesellschaft - Sitz: Passau - Amtsgericht Passau HRB 6032
Mitglieder des Vorstands: Sven Anders, Marcus Junker, Michael Schön
Vorsitzender des Aufsichtsrats: Dipl. Kfm. Thomas Träger

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFCEQO85lKZ7Feg4EcRAnduAJ9tM814MWuLVci40f4TJEtrqWco4gCf V/kP
ditSozEj/lfJgxNShZypdJA=
=PoJg
-----END PGP SIGNATURE-----

--------------080107040903030502010403
Content-Type: text/x-vcard; charset=utf8;
name="anders.vcf"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="anders.vcf"

begin:vcard
fn:Sven Anders
n:Anders;Sven
org:ANDURAS AG;Research and Development
adr;quoted-printable:;;Innstraße 71;Passau;Bavaria;94036;Germany
email;internet:anders@anduras.de
title:Dipl. Inf.
tel;work:++49 (0)851 / 490 50 - 0
tel;fax:+49 (0)851 / 4 90 50 - 55
x-mozilla-html:FALSE
url:http://www.anduras.de
version:2.1
end:vcard


--------------080107040903030502010403--
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html