Understanding bonnie++ results

am 28.02.2008 10:46:29 von Franck Routier

Hi,

I am experimenting with Adaptec 31205 hardware raid versus md raid on
raid level 10 with 3 arrays of 4 disks each.
md array was created with f2 option.
I get some results with bonnie++ tests I would like to understand:

- per char sequential output is consistantly around 70k/sec for both
setup
- but block sequential output shows a huge difference between hw and sw
raid: about 160k/sec for hw versus 60k/sec for md. Where can this come
from ??

On the contrary, md beat hw on inputs:
- sequential input show 360k/sec versus 220k/sec for hw
- random seek 1350/sec for md versus 1150/sec for hw

So, these bonnie++ tests show quite huge differences for the same
hardware between adaptec's hardware setup and md driver.

Does anyone has any explanation on this ? (btw, the fs on top of this is
xfs).

Franck

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 28.02.2008 20:06:09 von keld

On Thu, Feb 28, 2008 at 10:46:29AM +0100, Franck Routier wrote:
> Hi,
>
> I am experimenting with Adaptec 31205 hardware raid versus md raid on
> raid level 10 with 3 arrays of 4 disks each.
> md array was created with f2 option.

what are the characteristics of your disks? Are they all the same size
and same speed etc?

What kind of raid are you creating with the Adaptec HW? I assume you
make a RAID1 with this.

What is the chunk size?

Are your figures for one of the arrays, that is for an array of 4
drives?

> I get some results with bonnie++ tests I would like to understand:
>
> - per char sequential output is consistantly around 70k/sec for both
> setup

I think the common opinion on this list is to ignore this figure.
However, if you are using this for postgresql databases, this may be relevant.

> - but block sequential output shows a huge difference between hw and sw
> raid: about 160k/sec for hw versus 60k/sec for md. Where can this come
> from ??

Strange. Maybe see if the md array has been fully synced before testing.
For sequential writes on a 4 drive raid10,f2 with disks of 90 MB/s
I would expect a writing rate of about 160 MB/s - which is the same as
your HW rate. (I assume you mean MB/s instead of k/sec)

> On the contrary, md beat hw on inputs:
> - sequential input show 360k/sec versus 220k/sec for hw

raid10,f2 stripes, while normal raid1 does not. Also raid10,f2
tends to only use the outer and faster sectors of disks.

> - random seek 1350/sec for md versus 1150/sec for hw

Random seeks in raid10,f2 tends to be restricted to a smaller range of
sectors, thus making average seek times smaller.

> So, these bonnie++ tests show quite huge differences for the same
> hardware between adaptec's hardware setup and md driver.

I like to get such results of comparison between HW and SW raid.
How advanced are Adaptec controllers considered these days?
My thoughts are that SW raid is faster than HW raid, because Neil and the
other people here together can develop more sophisticated algorithms,
but I would like some hard figures to back up that thought.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 29.02.2008 09:19:52 von Franck Routier

Hi,

sorry for being vague on my first post.

Ok, here are the facts:

1) Hardware

-server is a dual AMD dual core opteron system, running Linux 2.6.22 (a=
s
ship by Ubuntu 7.10 server)
-disk controller is an Adaptec 31205 SAS AAC-RAID controller, with 256M=
B
of cache
-disks are 72GB FUJITSU MAX3073RC SAS 3"5 disks at 15krpm with 16MB
buffer size (plus two Maxtor Sata disks for systems controlled by the
mother board)
-system RAM is 8 GB

2) Usage

this will be a Postgresql database server, holding a mix of
Datawarehouse / Operational Data Store application

3) The tests

I am in no way an expert in system administration or benchmarking... so
I simply launched bonnie++ with no parameter other than

$bonnie++ -d /a_dir_on_my_array

letting bonnie decide what best file size and ram size to use. Bonnie's
computed size was 16GB.

=46ile system is XFS with noatime,nodiratime options.

Each test was launched on hardware raid and then on software raid:
- RAID 10 on 6 disks
- RAID 10 on 4 disks
Linux software raid 10 was created using mdadm level with default (near=
)
layout and default chunk size, then with far 2 option and 256k chunk
size.
=46or 4 disks arrays, I also tried to launch 2 bonnie++ tests in parall=
el,
on two different arrays, to see the impact.

Here are the results, in bonnie++ csv format:

6 disks:
-hw raid
goules-hw-raid10-6,16G,72343,98,235192,37,107093,19,64163,89 ,286958,26,=
1323.1,2,16,23028,95,+++++,+++,20643,70,19770,95,+++++,+++,1 7396,81
-md raid (chunk 64k)
goules-md-raid10-6,16G,72413,99,196164,42,42249,8,51613,72,5 2311,5,1486=
7,3,16,13437,61,+++++,+++,9736,42,12128,59,+++++,+++,8526 ,44

4 disks:
-hw raid:
goules-hw-raid10-4,16G,72462,99,162303,25,87544,16,64049,89, 211526,19,1=
179.7,2,16,20894,96,+++++,+++,19563,64,20160,98,+++++,+++,18 794,78
-md raid
goules-md-raid10-4,16G,70206,99,162525,35,30169,5,33898,47,3 4888,3,1347=
3,2,16,17837,81,+++++,+++,14735,61,15211,66,+++++,+++,781 0,31
-md raid with f2 option and 256 chunk size
goules-md-raid10-4-f2-256-xfs,16G,69928,97,93985,20,56930,11 ,68669,98,3=
56923,37,1327.1,2,16,20001,87,+++++,+++,20392,73,19773,88,++ +++,+++,522=
8,23

4 disks with 2 bonnie++ running simultaneously:
-hw raid:
goules-hw-raid10-4-P1,16G,70682,96,145883,28,54263,10,60888, 86,205427,2=
0,837.4,1,16,20742,97,+++++,+++,20969,76,19801,100,+++++,+++ ,18789,79
goules-hw-raid10-4-P2,16G,72405,99,138678,26,56571,11,60876, 84,205619,2=
1,679.8,2,16,20067,93,+++++,+++,14698,53,17090,87,+++++,+++, 9041,42
-md raid with near option and 64k chunk:
goules-md-raid10-4-P1,16G,72183,98,100149,24,28057,5,33398,4 4,34624,3,7=
71.8,1,16,16057,71,+++++,+++,9576,32,15871,77,+++++,+++,7357 ,33
goules-md-raid10-4-P2,16G,72467,99,99952,24,28424,5,33361,44 ,34681,3,88=
3.2,2,16,13032,67,+++++,+++,10759,46,13157,56,+++++,+++,7424 ,36

4) The interpretation

Here is the difficult part! I also realize that my tests are not so
consistent (chunk size varies for md raid). But here is what I see:
-sequential output is quite similar for each setup, with hw raid being =
a
bit better
-sequential input varies greatly, the big winner being md-f2-256 setup
with 356923K/sec, and the big loser md-near-64 setup with 34888K/sec
(factor of 10 !)
- what seems the most relevant to me, random seeks are always better on
software raid, by 10 to 20%, but I have no idea why.
- and running two bonnie++ in parallel on two 4 disks arrays gives
better iops than 6 disks arrays.

So I tend to think I'd better use md-f2-256 with 3 arrays of 4 disks an=
d
use tablespaces to make sure my requests are spread out on the 3 arrays=

But this conclusion may suffer from many many flaws, the first one bein=
g
my understanding of raid, fs and io :)

So, any comment ?

Thanks,
=46ranck

Le jeudi 28 fÃ©vrier 2008 Ã 20:06 +0100, Keld JÃ¸rn Simons=
en a Ã©crit :
> On Thu, Feb 28, 2008 at 10:46:29AM +0100, Franck Routier wrote:
> > Hi,
> >=20
> > I am experimenting with Adaptec 31205 hardware raid versus md raid =
on
> > raid level 10 with 3 arrays of 4 disks each.
> > md array was created with f2 option.
>=20
> what are the characteristics of your disks? Are they all the same siz=
e
> and same speed etc?
>=20
> What kind of raid are you creating with the Adaptec HW? I assume you
> make a RAID1 with this.
>=20
> What is the chunk size?
>=20
> Are your figures for one of the arrays, that is for an array of 4
> drives?
>=20
> > I get some results with bonnie++ tests I would like to understand:
> >=20
> > - per char sequential output is consistantly around 70k/sec for bot=
h
> > setup
>=20
> I think the common opinion on this list is to ignore this figure.
> However, if you are using this for postgresql databases, this may be =
relevant.
>=20
> > - but block sequential output shows a huge difference between hw an=
d sw
> > raid: about 160k/sec for hw versus 60k/sec for md. Where can this c=
ome
> > from ??
>=20
> Strange. Maybe see if the md array has been fully synced before testi=
ng.
> For sequential writes on a 4 drive raid10,f2 with disks of 90 MB/s
> I would expect a writing rate of about 160 MB/s - which is the same a=
s=20
> your HW rate. (I assume you mean MB/s instead of k/sec)
>=20
> > On the contrary, md beat hw on inputs:
> > - sequential input show 360k/sec versus 220k/sec for hw
>=20
> raid10,f2 stripes, while normal raid1 does not. Also raid10,f2
> tends to only use the outer and faster sectors of disks.
>=20
> > - random seek 1350/sec for md versus 1150/sec for hw
>=20
> Random seeks in raid10,f2 tends to be restricted to a smaller range o=
f
> sectors, thus making average seek times smaller.
> =20
> > So, these bonnie++ tests show quite huge differences for the same
> > hardware between adaptec's hardware setup and md driver.
>=20
> I like to get such results of comparison between HW and SW raid.
> How advanced are Adaptec controllers considered these days?=20
> My thoughts are that SW raid is faster than HW raid, because Neil and=
the
> other people here together can develop more sophisticated algorithms,
> but I would like some hard figures to back up that thought.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 29.02.2008 09:24:42 von Franck Routier

By the way, my server in not really in a production state, and the
hardware is running, so I might take some time to do sensible tests if
anyone has ideas of what a good test would be...

> I like to get such results of comparison between HW and SW raid.
> How advanced are Adaptec controllers considered these days?
> My thoughts are that SW raid is faster than HW raid, because Neil and the
> other people here together can develop more sophisticated algorithms,
> but I would like some hard figures to back up that thought.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 29.02.2008 10:26:09 von Franck Routier

Hi,

>
> OK. How fast are the Fujitsu disks, measured by a simple hdparm -t?

Timing buffered disk reads: 270 MB in 3.02 seconds = 89.52 MB/sec

>
> OK, so the problem you reported earlier, that HW raid was faster than
> raid10,f2 for writing, is gone?
>
Well actually not for sequetial output. Here are the most comparable
figures:

4dk hw raid 10, 256k chunks versus 4dks md raid 10, 256k chunks
per char sequential output = 69475 vs 69928 <------ this is similar
block sequential output = 159649 vs 93985 <------ hw is much faster
rewrite sequential output = 85914 vs 56930 <------ hw is much faster

But on reading, md is faster:
per char sequential input = 61622 vs 68669 <------ still comparable
block sequential input = 221771 vs 356923 <------ md is way faster
ransom seek = 1327.1 vs 1149.7 <------ that's a 15.4%
difference

> And you do a HW RAID10? Are you able to specify chunk size here?

Yes, RAID10 is an option in Adaptec's bios, and chink size can be set up
to 512k

> > -sequential input varies greatly, the big winner being md-f2-256 setup
> > with 356923K/sec, and the big loser md-near-64 setup with 34888K/sec
> > (factor of 10 !)
>
> Both the chunk size, and the observation that raid10,n2 only reads from one
> disk at a time, gives reasons to this. I already explained why raid10,f2
> would be faster than HW RAID10.
>
True, and quite impressive.

> > - what seems the most relevant to me, random seeks are always better on
> > software raid, by 10 to 20%, but I have no idea why.
>
> raid10.f2 woul only seek on half the disk, so that would diminish the
> seek times.
>
Great. But in fact md raid 10 near layout (with 64k chunks, that might
matter), gave me slitghly better results than l2 (1347.3 for near versus
1327.1 for far)

> > - and running two bonnie++ in parallel on two 4 disks arrays gives
> > better iops than 6 disks arrays.
>
> I would run a combined 12 disk array raid10,f2 with adequate chunk size,
> I think that would get the best performance for you.
>
I will try that.

> > So I tend to think I'd better use md-f2-256 with 3 arrays of 4 disks and
> > use tablespaces to make sure my requests are spread out on the 3 arrays.
> > But this conclusion may suffer from many many flaws, the first one being
> > my understanding of raid, fs and io :)
> >
> > So, any comment ?
>
> I would try to test it out, but I don't know if you can get a good
> benchmark for database enquieries.

That's the real problem for sure. I can throw in some huge queries, but
kernel resources and postgresql.conf clearly will change things much
more than raw disk io. That why I thought of running several bonnie++
tests in parallel, and add random seek results to simulate database
reading...

Thanks
Franck

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 29.02.2008 10:32:59 von Franck Routier

Sorry, on the last mail I gave figures in the wrong order for random
seek: md is 15% _faster_ than hw, not the other way.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 01.03.2008 03:05:12 von keld

On Fri, Feb 29, 2008 at 09:24:42AM +0100, Franck Routier wrote:
> By the way, my server in not really in a production state, and the
> hardware is running, so I might take some time to do sensible tests if
> anyone has ideas of what a good test would be...

I would like you to investigate why random writes are so relatively
slow with raid10,f2. You could run the bonnie+ tests, and then watch
via an iostat how each of the disks are performing, compared to a
HW RAID10.

And also see if it matters if the resync has completed or not.

Best regards
keld

>
> > I like to get such results of comparison between HW and SW raid.
> > How advanced are Adaptec controllers considered these days?
> > My thoughts are that SW raid is faster than HW raid, because Neil and the
> > other people here together can develop more sophisticated algorithms,
> > but I would like some hard figures to back up that thought.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 01.03.2008 21:04:19 von Bill Davidsen

Franck Routier wrote:
> Hi,
>
> I am experimenting with Adaptec 31205 hardware raid versus md raid on
> raid level 10 with 3 arrays of 4 disks each.
> md array was created with f2 option.
> I get some results with bonnie++ tests I would like to understand:
>
> - per char sequential output is consistantly around 70k/sec for both
> setup
> - but block sequential output shows a huge difference between hw and sw
> raid: about 160k/sec for hw versus 60k/sec for md. Where can this come
> from ??
>
>
Do you have a base raw read speed for a single drive? That helps
visualize things as percentages of single drive speed. Also, is the
Adaptec hw raid10 really raid1+0 or the distributed raid10 done by Linux
sw raid?
> On the contrary, md beat hw on inputs:
> - sequential input show 360k/sec versus 220k/sec for hw
> - random seek 1350/sec for md versus 1150/sec for hw
>
> So, these bonnie++ tests show quite huge differences for the same
> hardware between adaptec's hardware setup and md driver.
>
> Does anyone has any explanation on this ? (btw, the fs on top of this is
> xfs).
>
>
I'm sure that's a factor, but I usually run my tests on the raw array
first to see how the array performs, then test on various filesystem
types to see how they use the array. Alignment of the f/s on the array
becomes important, as well as f/s-specific tuning parameters. Someone
adept at xfs optimal layout will have to help you there, I can't.

--
Bill Davidsen
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Understanding bonnie++ results

am 01.03.2008 21:25:45 von Franck Routier

Le samedi 01 mars 2008 Ã 15:04 -0500, Bill Davidsen a Ã©crit :
> Do you have a base raw read speed for a single drive? That helps=20

Disk speed measured by hdparm -t shows :
Timing buffered disk reads: 270 MB in 3.02 seconds =3D 89.52 MB/sec

> Adaptec hw raid10 really raid1+0 or the distributed raid10 done by Li=
nux=20
> sw raid?

I don't know, but RAID10 is an option on it's own in the bios (I don't
have to build raid 1 then raid 0)

=46ranck

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html