Software RAID and TRIM

Software RAID and TRIM

am 28.06.2011 17:31:35 von Tom De Mulder

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--140733193388032-2110314777-1309275095=:257
Content-Type: TEXT/PLAIN; format=flowed; charset=UTF-8
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi,


I'm investigating SSD performance on Linux, in particular for RAID=20
devices.

As I understand itâ€=94and please correct me if I'm wrongâ€=94curre=
ntly software=20
RAID does not pass through TRIM to the underlying devices. TRIM is=20
essential for the continued high performance of SSDs, which otherwise=20
degrade over time.

I don't think there would be any harm in this command being passed through=
=20
to underlying devices if they don't support it (they would just ignore=20
it), and if they do it would make high-performance software RAID of SSDs a=
=20
possibility.


Is this something that's in the works?



Many thanks,

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 28/06/2011 : The Moon is Waning Crescent (22% of Full)
--140733193388032-2110314777-1309275095=:257--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 28.06.2011 18:11:04 von mathias.buren

On 28 June 2011 16:31, Tom De Mulder wrote:
> Hi,
>
>
> I'm investigating SSD performance on Linux, in particular for RAID de=
vices.
>
> As I understand itâ€=94and please correct me if I'm wrongâ€=94=
currently software
> RAID does not pass through TRIM to the underlying devices. TRIM is es=
sential
> for the continued high performance of SSDs, which otherwise degrade o=
ver
> time.
>
> I don't think there would be any harm in this command being passed th=
rough
> to underlying devices if they don't support it (they would just ignor=
e it),
> and if they do it would make high-performance software RAID of SSDs a
> possibility.
>
>
> Is this something that's in the works?
>
>
>
> Many thanks,
>
> --
> Tom De Mulder - Cambridge University Computing Serv=
ice
> +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3=
QH
> -> 28/06/2011 : The Moon is Waning Crescent (22% of Full)


IIRC md can already pass TRIM down, but I think the filesystem needs
to know about the underlying architecture, or something, for TRIM to
work in RAID. There's numerous discussions on this in the archives of
this mailing list.

/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 28.06.2011 18:17:35 von Johannes Truschnigg

--C7zPtVaVf+AK4Oqc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Tom,
On Tue, Jun 28, 2011 at 04:31:35PM +0100, Tom De Mulder wrote:
> Hi,
> [...]
> Is this something that's in the works?

Iirc, dm-raid supports passthru of DSM/TRIM commands for its provided RAID0
and RAID1 levels. Maybe that's already enough for your purposes?

I don't know if there's any development going on on the md side of things in
in that regard. Others on this list will surely be able to answer that
question, however.

Have a nice day!
--=20
with best regards:=20
- Johannes Truschnigg ( johannes@truschnigg.info )

www: http://johannes.truschnigg.info/=20
phone: +43 650 2 133337=20
xmpp: johannes@truschnigg.info

Please do not bother me with HTML-eMail or attachments. Thank you.

--C7zPtVaVf+AK4Oqc
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk4J/p4ACgkQnnUApj8OcoJy0ACgkWRuAysq3rHCVoD9UwY1 rT0/
tUwAoJ74fzFryQUAqlCYNDVBdlZQUWpD
=YZ3W
-----END PGP SIGNATURE-----

--C7zPtVaVf+AK4Oqc--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 28.06.2011 18:40:45 von David Brown

On 28/06/11 17:31, Tom De Mulder wrote:
> Hi,
>
>
> I'm investigating SSD performance on Linux, in particular for RAID de=
vices.
>
> As I understand itâ€=94and please correct me if I'm wrongâ€=94=
currently software
> RAID does not pass through TRIM to the underlying devices. TRIM is
> essential for the continued high performance of SSDs, which otherwise
> degrade over time.
>
> I don't think there would be any harm in this command being passed
> through to underlying devices if they don't support it (they would ju=
st
> ignore it), and if they do it would make high-performance software RA=
ID
> of SSDs a possibility.
>
>
> Is this something that's in the works?
>
>

I don't think you are wrong about software raid not passing TRIM down t=
o=20
the device (IIRC, it /can/ be passed down through LVM raid setups, but=20
they are slower and less flexible than md raid).

However, AFAIUI, you are wrong about TRIM being essential for the=20
continued high performance of SSDs. As long as your SSDs have some=20
over-provisioning (or you only partition something like 90% of the=20
drive), and it's got good garbage collection, then TRIM will have=20
minimal effect.

TRIM only makes a big difference in benchmarks which fill up most of a=20
disk, then erase the files, then start writing them again, and even the=
n=20
it is mainly with older flash controllers.

I think other SSD-optimisations, such as those in BTRFS, are much more=20
important. These include bypassing or disabling code that is aimed at=20
optimising disk access and minimising head movement - such code is of=20
great benefit with hard disks, but helps little and adds latency on SSD=
=20
systems.

(I haven't done any benchmarks to justify this opinion, nor have I=20
direct links - it's based on my understanding of TRIM and how SSDs work=
,=20
and how SSD controllers have changed between early devices and current=20
ones.)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 12:32:55 von Tom De Mulder

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--140733193388032-1471375585-1309342982=:257
Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-15; FORMAT=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Content-ID:

On Tue, 28 Jun 2011, Mathias Bur=E9n wrote:

> IIRC md can already pass TRIM down, but I think the filesystem needs
> to know about the underlying architecture, or something, for TRIM to
> work in RAID.

Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM=20
command, and that's what ext4 can do. I have it working just fine on=20
single drives, but for reasons of service reliability would need to get=20
RAID to work.

I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two=
=20
drive RAID1 md and it definitely didn't work (the blocks didn't get marked=
=20
as unused and zeroed).

> There's numerous discussions on this in the archives of
> this mailing list.

Given how fast things move in the world of SSDs at the moment, I wanted to=
=20
check if any progress was made since. :-) I don't seem to be able to find=
=20
any reference to this in recent kernel source commits (but I'm a complete=
=20
amateur when it comes to git).


Thanks,

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 29/06/2011 : The Moon is Waning Crescent (18% of Full)
--140733193388032-1471375585-1309342982=:257--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 12:33:23 von Tom De Mulder

On 28/06/11, David Brown wrote:

> However, AFAIUI, you are wrong about TRIM being essential for the
> continued high performance of SSDs. As long as your SSDs have some
> over-provisioning (or you only partition something like 90% of the
> drive), and it's got good garbage collection, then TRIM will have
> minimal effect.

While you are mostly correct, over time even consumer SSDs will end up in
this state.

Maybe I should have specified--my particular aim is to try and use (fairly
high-end) consumer SSDs for "enterprise" server applications, hence the
research into RAID. Most hardware RAID controllers that I know of don't
pass on the TRIM command (for various reasons), so I was hoping to have
more luck with software RAID.


Best,

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 29/06/2011 : The Moon is Waning Crescent (18% of Full)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 12:45:19 von NeilBrown

On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder >
wrote:

> On Tue, 28 Jun 2011, Mathias Bur=E9n wrote:
>=20
> > IIRC md can already pass TRIM down, but I think the filesystem need=
s
> > to know about the underlying architecture, or something, for TRIM t=
o
> > work in RAID.
>=20
> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM=20
> command, and that's what ext4 can do. I have it working just fine on=20
> single drives, but for reasons of service reliability would need to g=
et=20
> RAID to work.
>=20
> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a=
two=20
> drive RAID1 md and it definitely didn't work (the blocks didn't get m=
arked=20
> as unused and zeroed).
>=20
> > There's numerous discussions on this in the archives of
> > this mailing list.
>=20
> Given how fast things move in the world of SSDs at the moment, I want=
ed to=20
> check if any progress was made since. :-) I don't seem to be able to =
find=20
> any reference to this in recent kernel source commits (but I'm a comp=
lete=20
> amateur when it comes to git).


Trim support for md is a long way down my list of interesting projects =
(and
no-one else has volunteered).

It is not at all straight forward to implement.

=46or stripe/parity RAID, (RAID4/5/6) it is only safe to discard full s=
tripes at
a time, and the md layer would need to keep a record of which stripes h=
ad been
discarded so that it didn't risk trusting data (and parity) read from t=
hose
stripes. So you would need some sort of bitmap of invalid stripes, and=
you
would need the fs to discard in very large chunks for it to be useful a=
t all.

=46or copying RAID (RAID1, RAID10) you really need the same bitmap. Th=
ere
isn't the same risk of reading and trusting discarded parity, but a res=
ync
which didn't know about discarded ranges would undo the discard for you=


So is basically requires another bitmap to be stored with the metadata,=
and a
fairly fine-grained bitmap it would need to be. Then every read and re=
sync
checks the bitmap and ignores or returns 0 for discarded ranges, and ev=
ery
write needs to check and if the range was discard, clear the bit and wr=
ite to
the whole range.

So: do-able, but definitely non-trivial.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 13:10:29 von Tom De Mulder

On Wed, 29 Jun 2011, NeilBrown wrote:

> It is not at all straight forward to implement.
>
> For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full stripes at
> a time, and the md layer would need to keep a record of which stripes had been
> discarded so that it didn't risk trusting data (and parity) read from those
> stripes. So you would need some sort of bitmap of invalid stripes, and you
> would need the fs to discard in very large chunks for it to be useful at all.
>
> For copying RAID (RAID1, RAID10) you really need the same bitmap. There
> isn't the same risk of reading and trusting discarded parity, but a resync
> which didn't know about discarded ranges would undo the discard for you.

However, that might not necessarily be a problem; tools exist that can be
run manually (slightly fsck-like) and tell the drive which blocks can be
erased.

> So: do-able, but definitely non-trivial.

Thanks very much for your response, you make some very good points.

I shall, for the time being, chop my SSDs in half and let them treat the
empty half as spare area, which should make performance degradation a
non-issue. I hope.


Cheers,

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 29/06/2011 : The Moon is Waning Crescent (18% of Full)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 13:48:52 von launchpad

On Wed, Jun 29, 2011 at 7:10 AM, Tom De Mulder wrote:
> However, that might not necessarily be a problem; tools exist that can be run manually (slightly fsck-like) and tell the drive which blocks can be erased.

For RAID5/6 at least, md will still require knowledge of what stripes
are and are not in use by the filesystem. In the current
implementation, the entire array must be consistent, regardless of
whether or not a particular block is in use. As far as my
understanding goes, any level of TRIM support for parity arrays would
be a fundamental shift in the way md treats the array.

The simplest solution I see is to do as Niel suggested, and mimic TRIM
support at the RAID level, and pass commands down as necessary. An
alternative solution would be to add a second TRIM layer, where md
maintains a list of what is or is not in use, and once an entire
stripe has been discarded by the filesystem, it can send a single TRIM
command to each member drive to drop the entire stripe contents. This
adds abstraction for the filesystem layer, allowing it to treat the
RAID array like a regular SSD, but adds significant complexity to md
itself.

-Scott

p.s. Sorry if you receive this twice; Majordomo rejected the first one
on HTML subpart basis.

--
Scott Armitage, B.A.Sc., M.A.Sc. candidate
Space Flight Laboratory
University of Toronto Institute for Aerospace Studies
4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 14:42:12 von David Brown

On 29/06/2011 12:33, Tom De Mulder wrote:
> On 28/06/11, David Brown wrote:
>
>> However, AFAIUI, you are wrong about TRIM being essential for the
>> continued high performance of SSDs. As long as your SSDs have some
>> over-provisioning (or you only partition something like 90% of the
>> drive), and it's got good garbage collection, then TRIM will have
>> minimal effect.
>
> While you are mostly correct, over time even consumer SSDs will end up
> in this state.
>

I don't quite follow you here - what state will consumer SSDs end up in?

> Maybe I should have specified--my particular aim is to try and use
> (fairly high-end) consumer SSDs for "enterprise" server applications,
> hence the research into RAID. Most hardware RAID controllers that I know
> of don't pass on the TRIM command (for various reasons), so I was hoping
> to have more luck with software RAID.
>
>

Now you know /why/ hardware RAID controllers don't implement TRIM!


Have you tried any real-world benchmarking with realistic loads with a
single SSD, ext4, and TRIM on and off? Almost every article I've seen
on the subject is using very synthetic benchmarks, almost always on
windows, few are done with current garbage-collecting SSDs. It seems to
be accepted wisdom from the early days of SSDs that TRIM makes a big
difference - and few people challenge that with real numbers or real
thought, even though the internal structure of the flash has changed
dramatically (transparent compression, for example, gives a completely
different effect).

Of course, if you /do/ try it yourself and can show clear figures, then
I'm willing to change my mind :-) If I had a spare SSD, I'd do the
testing myself.



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 14:46:08 von David Brown

On 29/06/2011 12:45, NeilBrown wrote:
> On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder k>
> wrote:
>
>> On Tue, 28 Jun 2011, Mathias Bur=E9n wrote:
>>
>>> IIRC md can already pass TRIM down, but I think the filesystem need=
s
>>> to know about the underlying architecture, or something, for TRIM t=
o
>>> work in RAID.
>>
>> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM
>> command, and that's what ext4 can do. I have it working just fine on
>> single drives, but for reasons of service reliability would need to =
get
>> RAID to work.
>>
>> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on =
a two
>> drive RAID1 md and it definitely didn't work (the blocks didn't get =
marked
>> as unused and zeroed).
>>
>>> There's numerous discussions on this in the archives of
>>> this mailing list.
>>
>> Given how fast things move in the world of SSDs at the moment, I wan=
ted to
>> check if any progress was made since. :-) I don't seem to be able to=
find
>> any reference to this in recent kernel source commits (but I'm a com=
plete
>> amateur when it comes to git).
>
>
> Trim support for md is a long way down my list of interesting project=
s (and
> no-one else has volunteered).
>
> It is not at all straight forward to implement.
>
> For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full s=
tripes at
> a time, and the md layer would need to keep a record of which stripes=
had been
> discarded so that it didn't risk trusting data (and parity) read from=
those
> stripes. So you would need some sort of bitmap of invalid stripes, a=
nd you
> would need the fs to discard in very large chunks for it to be useful=
at all.
>
> For copying RAID (RAID1, RAID10) you really need the same bitmap. Th=
ere
> isn't the same risk of reading and trusting discarded parity, but a r=
esync
> which didn't know about discarded ranges would undo the discard for y=
ou.
>
> So is basically requires another bitmap to be stored with the metadat=
a, and a
> fairly fine-grained bitmap it would need to be. Then every read and =
resync
> checks the bitmap and ignores or returns 0 for discarded ranges, and =
every
> write needs to check and if the range was discard, clear the bit and =
write to
> the whole range.
>
> So: do-able, but definitely non-trivial.
>

Wouldn't the sync/no-sync tracking you already have planned be usable=20
for tracking discarded areas? Or will that not be find-grained enough=20
for the purpose?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 14:46:15 von Roberto Spadim

some ideas....

maybe for a test only...
we could send trim commands on raid1 arrays only or 'raid0 linear'
since they don=B4t stripe, this could be 'easy' to develop
when filesystem send trim, we send it to down device (/dev/sdX99)
there=B4s a problem of offset (for raid1) maybe some devices just work
with 4096bytes blocks on trim command, maybe not
we could implement and put in a beta/alpha realease to test like ext4
guys are doing with discard command (it=B4s a user option today)


2011/6/29 Scott E. Armitage :
> On Wed, Jun 29, 2011 at 7:10 AM, Tom De Mulder wrot=
e:
>> However, that might not necessarily be a problem; tools exist that c=
an be run manually (slightly fsck-like) and tell the drive which blocks=
can be erased.
>
> For RAID5/6 at least, md will still require knowledge of what stripes
> are and are not in use by the filesystem. In the current
> implementation, the entire array must be consistent, regardless of
> whether or not a particular block is in use. As far as my
> understanding goes, any level of TRIM support for parity arrays would
> be a fundamental shift in the way md treats the array.
>
> The simplest solution I see is to do as Niel suggested, and mimic TRI=
M
> support at the RAID level, and pass commands down as necessary. An
> alternative solution would be to add a second TRIM layer, where md
> maintains a list of what is or is not in use, and once an entire
> stripe has been discarded by the filesystem, it can send a single TRI=
M
> command to each member drive to drop the entire stripe contents. This
> adds abstraction for the filesystem layer, allowing it to treat the
> RAID array like a regular SSD, but adds significant complexity to md
> itself.
>
> -Scott
>
> p.s. Sorry if you receive this twice; Majordomo rejected the first on=
e
> on HTML subpart basis.
>
> --
> Scott Armitage, B.A.Sc., M.A.Sc. candidate
> Space Flight Laboratory
> University of Toronto Institute for Aerospace Studies
> 4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 14:55:09 von Tom De Mulder

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--140733193388032-1400123164-1309352109=:257
Content-Type: TEXT/PLAIN; format=flowed; charset=ISO-8859-15
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Wed, 29 Jun 2011, David Brown wrote:

>> While you are mostly correct, over time even consumer SSDs will end up
>> in this state.
> I don't quite follow you here - what state will consumer SSDs end up in?

Sorry, I meant to say "SSDs in typical consumer desktop machines". The=20
state where writes are very slow.

> Have you tried any real-world benchmarking with realistic loads with a si=
ngle=20
> SSD, ext4, and TRIM on and off? Almost every article I've seen on the su=
bject=20
> is using very synthetic benchmarks, almost always on windows, few are don=
e=20
> with current garbage-collecting SSDs. It seems to be accepted wisdom fro=
m the=20
> early days of SSDs that TRIM makes a big difference - and few people chal=
lenge=20
> that with real numbers or real thought, even though the internal structur=
e of=20
> the flash has changed dramatically (transparent compression, for example,=
=20
> gives a completely different effect).
>
> Of course, if you /do/ try it yourself and can show clear figures, then I=
'm=20
> willing to change my mind :-) If I had a spare SSD, I'd do the testing=
=20
> myself.

I have a set of 4 Intel 510 SSDs purely for testing, and I have used these=
=20
to simulate the kinds of workload I would expect them to experience in a=20
server environment (focused mainly on database access). So far, those=20
tests have focused on using single drives (ie. without RAID) on a variety=
=20
of controllers.

Once the drives get fuller (something which does happen on servers) I do=20
indeed see write latencies that are in the order of several seconds (I saw=
=20
from 1500=B5s to 6000=B5s), as the drive suddenly struggles to free entire=
=20
blocks, where initially latency was in the single digits.

I am hoping to get my hands on some Sandforce controller-based SSDs as=20
well, to compare, but even they show degradation as they get fuller in=20
AnandTech's tests (and those tests seem, IME, trustworthy).

My current plan is to sacrifice half the capacity by partitioning, stick 2=
=20
of them in md RAID1 (so, without TRIM) and over the next few days to run=20
benchmarks over them, to see what the end result is.


Best,

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 29/06/2011 : The Moon is Waning Crescent (18% of Full)
--140733193388032-1400123164-1309352109=:257--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 15:02:49 von Roberto Spadim

nice,
anyone know if freebsd or netbsd or other o.s. have this (raid trim)
to do some benchmarks without losing our time developing?

2011/6/29 Tom De Mulder :
> On Wed, 29 Jun 2011, David Brown wrote:
>
>>> While you are mostly correct, over time even consumer SSDs will end=
up
>>> in this state.
>>
>> I don't quite follow you here - what state will consumer SSDs end up=
in?
>
> Sorry, I meant to say "SSDs in typical consumer desktop machines". Th=
e state
> where writes are very slow.
>
>> Have you tried any real-world benchmarking with realistic loads with=
a
>> single SSD, ext4, and TRIM on and off? =A0Almost every article I've =
seen on
>> the subject is using very synthetic benchmarks, almost always on win=
dows,
>> few are done with current garbage-collecting SSDs. =A0It seems to be=
accepted
>> wisdom from the early days of SSDs that TRIM makes a big difference =
- and
>> few people challenge that with real numbers or real thought, even th=
ough the
>> internal structure of the flash has changed dramatically (transparen=
t
>> compression, for example, gives a completely different effect).
>>
>> Of course, if you /do/ try it yourself and can show clear figures, t=
hen
>> I'm willing to change my mind :-) =A0If I had a spare SSD, I'd do th=
e testing
>> myself.
>
> I have a set of 4 Intel 510 SSDs purely for testing, and I have used =
these
> to simulate the kinds of workload I would expect them to experience i=
n a
> server environment (focused mainly on database access). So far, those=
tests
> have focused on using single drives (ie. without RAID) on a variety o=
f
> controllers.
>
> Once the drives get fuller (something which does happen on servers) I=
do
> indeed see write latencies that are in the order of several seconds (=
I saw
> from 1500=B5s to 6000=B5s), as the drive suddenly struggles to free e=
ntire
> blocks, where initially latency was in the single digits.
>
> I am hoping to get my hands on some Sandforce controller-based SSDs a=
s well,
> to compare, but even they show degradation as they get fuller in Anan=
dTech's
> tests (and those tests seem, IME, trustworthy).
>
> My current plan is to sacrifice half the capacity by partitioning, st=
ick 2
> of them in md RAID1 (so, without TRIM) and over the next few days to =
run
> benchmarks over them, to see what the end result is.
>
>
> Best,
>
> --
> Tom De Mulder - Cambridge University Computing Serv=
ice
> +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3=
QH
> -> 29/06/2011 : The Moon is Waning Crescent (18% of Full)



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 15:10:12 von David Brown

On 29/06/2011 14:55, Tom De Mulder wrote:
> On Wed, 29 Jun 2011, David Brown wrote:
>
>>> While you are mostly correct, over time even consumer SSDs will end=
up
>>> in this state.
>> I don't quite follow you here - what state will consumer SSDs end up=
in?
>
> Sorry, I meant to say "SSDs in typical consumer desktop machines". Th=
e
> state where writes are very slow.
>

Well, many consumer level systems use older or cheaper SSDs which don't=
=20
have the benefit of newer garbage collection, and don't have much=20
over-provisioning (you can always do that yourself by leaving some spac=
e=20
unpartitioned - but "consumer" users would typically not do that). And=
=20
remember that for users in this class, who will probably have small SSD=
s=20
to keep costs down, will have fairly full drives - making TRIM almost=20
useless.

>> Have you tried any real-world benchmarking with realistic loads with=
a
>> single SSD, ext4, and TRIM on and off? Almost every article I've see=
n
>> on the subject is using very synthetic benchmarks, almost always on
>> windows, few are done with current garbage-collecting SSDs. It seems
>> to be accepted wisdom from the early days of SSDs that TRIM makes a
>> big difference - and few people challenge that with real numbers or
>> real thought, even though the internal structure of the flash has
>> changed dramatically (transparent compression, for example, gives a
>> completely different effect).
>>
>> Of course, if you /do/ try it yourself and can show clear figures,
>> then I'm willing to change my mind :-) If I had a spare SSD, I'd do
>> the testing myself.
>
> I have a set of 4 Intel 510 SSDs purely for testing, and I have used
> these to simulate the kinds of workload I would expect them to
> experience in a server environment (focused mainly on database access=
).
> So far, those tests have focused on using single drives (ie. without
> RAID) on a variety of controllers.
>
> Once the drives get fuller (something which does happen on servers) I=
do
> indeed see write latencies that are in the order of several seconds (=
I
> saw from 1500=B5s to 6000=B5s), as the drive suddenly struggles to fr=
ee
> entire blocks, where initially latency was in the single digits.
>
> I am hoping to get my hands on some Sandforce controller-based SSDs a=
s
> well, to compare, but even they show degradation as they get fuller i=
n
> AnandTech's tests (and those tests seem, IME, trustworthy).
>
> My current plan is to sacrifice half the capacity by partitioning, st=
ick
> 2 of them in md RAID1 (so, without TRIM) and over the next few days t=
o
> run benchmarks over them, to see what the end result is.
>

Well, try it and see - and let us know the results. 50% manual=20
over-provisioning seems excessive, but I guess that's what you'll find=20
out with the tests.

>
> Best,
>
> --
> Tom De Mulder - Cambridge University Computing Serv=
ice
> +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3=
QH
> -> 29/06/2011 : The Moon is Waning Crescent (18% of Full)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 29.06.2011 15:39:24 von Namhyung Kim

NeilBrown writes:

> On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder uk>
> wrote:
>
>> On Tue, 28 Jun 2011, Mathias Burén wrote:
>>=20
>> > IIRC md can already pass TRIM down, but I think the filesystem nee=
ds
>> > to know about the underlying architecture, or something, for TRIM =
to
>> > work in RAID.
>>=20
>> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM=20
>> command, and that's what ext4 can do. I have it working just fine on=
=20
>> single drives, but for reasons of service reliability would need to =
get=20
>> RAID to work.
>>=20
>> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on =
a two=20
>> drive RAID1 md and it definitely didn't work (the blocks didn't get =
marked=20
>> as unused and zeroed).
>>=20
>> > There's numerous discussions on this in the archives of
>> > this mailing list.
>>=20
>> Given how fast things move in the world of SSDs at the moment, I wan=
ted to=20
>> check if any progress was made since. :-) I don't seem to be able to=
find=20
>> any reference to this in recent kernel source commits (but I'm a com=
plete=20
>> amateur when it comes to git).
>
>
> Trim support for md is a long way down my list of interesting project=
s (and
> no-one else has volunteered).
>

Just out of curiosity, what are there in your list? :)


--=20
Regards,
Namhyung Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 30.06.2011 02:27:08 von NeilBrown

On Wed, 29 Jun 2011 22:39:24 +0900 Namhyung Kim wr=
ote:

> NeilBrown writes:
>=20
> > On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder c.uk>
> > wrote:
> >
> >> On Tue, 28 Jun 2011, Mathias Bur=E9n wrote:
> >>=20
> >> > IIRC md can already pass TRIM down, but I think the filesystem n=
eeds
> >> > to know about the underlying architecture, or something, for TRI=
M to
> >> > work in RAID.
> >>=20
> >> Yes, it's (usually/ideally) the filesystem's job to invoke the TRI=
M=20
> >> command, and that's what ext4 can do. I have it working just fine =
on=20
> >> single drives, but for reasons of service reliability would need t=
o get=20
> >> RAID to work.
> >>=20
> >> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same o=
n a two=20
> >> drive RAID1 md and it definitely didn't work (the blocks didn't ge=
t marked=20
> >> as unused and zeroed).
> >>=20
> >> > There's numerous discussions on this in the archives of
> >> > this mailing list.
> >>=20
> >> Given how fast things move in the world of SSDs at the moment, I w=
anted to=20
> >> check if any progress was made since. :-) I don't seem to be able =
to find=20
> >> any reference to this in recent kernel source commits (but I'm a c=
omplete=20
> >> amateur when it comes to git).
> >
> >
> > Trim support for md is a long way down my list of interesting proje=
cts (and
> > no-one else has volunteered).
> >
>=20
> Just out of curiosity, what are there in your list? :)
>=20
>=20

http://neil.brown.name/blog/20110216044002

I have code for the first - the bad block log - and it seems to work. =
But I
really need to design and then perform some more testing.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 30.06.2011 02:28:46 von NeilBrown

On Wed, 29 Jun 2011 14:46:08 +0200 David Brown =
wrote:

> On 29/06/2011 12:45, NeilBrown wrote:
> > On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder uk>
> > wrote:
> >
> >> On Tue, 28 Jun 2011, Mathias Bur=E9n wrote:
> >>
> >>> IIRC md can already pass TRIM down, but I think the filesystem ne=
eds
> >>> to know about the underlying architecture, or something, for TRIM=
to
> >>> work in RAID.
> >>
> >> Yes, it's (usually/ideally) the filesystem's job to invoke the TRI=
M
> >> command, and that's what ext4 can do. I have it working just fine =
on
> >> single drives, but for reasons of service reliability would need t=
o get
> >> RAID to work.
> >>
> >> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same o=
n a two
> >> drive RAID1 md and it definitely didn't work (the blocks didn't ge=
t marked
> >> as unused and zeroed).
> >>
> >>> There's numerous discussions on this in the archives of
> >>> this mailing list.
> >>
> >> Given how fast things move in the world of SSDs at the moment, I w=
anted to
> >> check if any progress was made since. :-) I don't seem to be able =
to find
> >> any reference to this in recent kernel source commits (but I'm a c=
omplete
> >> amateur when it comes to git).
> >
> >
> > Trim support for md is a long way down my list of interesting proje=
cts (and
> > no-one else has volunteered).
> >
> > It is not at all straight forward to implement.
> >
> > For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full=
stripes at
> > a time, and the md layer would need to keep a record of which strip=
es had been
> > discarded so that it didn't risk trusting data (and parity) read fr=
om those
> > stripes. So you would need some sort of bitmap of invalid stripes,=
and you
> > would need the fs to discard in very large chunks for it to be usef=
ul at all.
> >
> > For copying RAID (RAID1, RAID10) you really need the same bitmap. =
There
> > isn't the same risk of reading and trusting discarded parity, but a=
resync
> > which didn't know about discarded ranges would undo the discard for=
you.
> >
> > So is basically requires another bitmap to be stored with the metad=
ata, and a
> > fairly fine-grained bitmap it would need to be. Then every read an=
d resync
> > checks the bitmap and ignores or returns 0 for discarded ranges, an=
d every
> > write needs to check and if the range was discard, clear the bit an=
d write to
> > the whole range.
> >
> > So: do-able, but definitely non-trivial.
> >
>=20
> Wouldn't the sync/no-sync tracking you already have planned be usable=
=20
> for tracking discarded areas? Or will that not be find-grained enoug=
h=20
> for the purpose?

That would be a necessary precursor to DISCARD support: yes.
DISCARD would probably require a much finer grain than I would otherwis=
e
suggest but I would design the feature to allow a range of granularitie=
s.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 30.06.2011 07:51:01 von Mikael Abrahamsson

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

---137064504-533477880-1309413061=:19581
Content-Type: TEXT/PLAIN; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8BIT

On Wed, 29 Jun 2011, Tom De Mulder wrote:

> I have a set of 4 Intel 510 SSDs purely for testing, and I have used
> these to simulate the kinds of workload I would expect them to
> experience in a server environment (focused mainly on database access).
> So far, those tests have focused on using single drives (ie. without
> RAID) on a variety of controllers.

From the tests I have read, the Intel 510 are actually worse than the
Intel X-25 G1/G2/320 models, with exactly the symptoms you're describing.
It's fast for linear reads and writes, but not so good for random writes,
especially not when it's getting full.

> Once the drives get fuller (something which does happen on servers) I do
> indeed see write latencies that are in the order of several seconds (I
> saw from 1500µs to 6000µs), as the drive suddenly struggles to free
> entire blocks, where initially latency was in the single digits.

Yeah, this is a common problem especially for older drives. A lot has
happened with garbage collect but the fact is still that a lot of SSD
vendors have too little spare area, so the recommendation you make
regarding leaving a large area unused is something I do as well, and it
works.

> I am hoping to get my hands on some Sandforce controller-based SSDs as well,
> to compare, but even they show degradation as they get fuller in AnandTech's
> tests (and those tests seem, IME, trustworthy).

Include the Intel 320 as well, I think it should be viable for your usage
pattern.

--
Mikael Abrahamsson email: swmike@swm.pp.se
---137064504-533477880-1309413061=:19581--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 30.06.2011 09:50:28 von David Brown

On 30/06/2011 02:28, NeilBrown wrote:
> On Wed, 29 Jun 2011 14:46:08 +0200 David Brown=
wrote:
>
>> On 29/06/2011 12:45, NeilBrown wrote:
>>> On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder uk>
>>> wrote:
>>>
>>>> On Tue, 28 Jun 2011, Mathias Bur=E9n wrote:
>>>>
>>>>> IIRC md can already pass TRIM down, but I think the filesystem ne=
eds
>>>>> to know about the underlying architecture, or something, for TRIM=
to
>>>>> work in RAID.
>>>>
>>>> Yes, it's (usually/ideally) the filesystem's job to invoke the TRI=
M
>>>> command, and that's what ext4 can do. I have it working just fine =
on
>>>> single drives, but for reasons of service reliability would need t=
o get
>>>> RAID to work.
>>>>
>>>> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same o=
n a two
>>>> drive RAID1 md and it definitely didn't work (the blocks didn't ge=
t marked
>>>> as unused and zeroed).
>>>>
>>>>> There's numerous discussions on this in the archives of
>>>>> this mailing list.
>>>>
>>>> Given how fast things move in the world of SSDs at the moment, I w=
anted to
>>>> check if any progress was made since. :-) I don't seem to be able =
to find
>>>> any reference to this in recent kernel source commits (but I'm a c=
omplete
>>>> amateur when it comes to git).
>>>
>>>
>>> Trim support for md is a long way down my list of interesting proje=
cts (and
>>> no-one else has volunteered).
>>>
>>> It is not at all straight forward to implement.
>>>
>>> For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full=
stripes at
>>> a time, and the md layer would need to keep a record of which strip=
es had been
>>> discarded so that it didn't risk trusting data (and parity) read fr=
om those
>>> stripes. So you would need some sort of bitmap of invalid stripes,=
and you
>>> would need the fs to discard in very large chunks for it to be usef=
ul at all.
>>>
>>> For copying RAID (RAID1, RAID10) you really need the same bitmap. =
There
>>> isn't the same risk of reading and trusting discarded parity, but a=
resync
>>> which didn't know about discarded ranges would undo the discard for=
you.
>>>
>>> So is basically requires another bitmap to be stored with the metad=
ata, and a
>>> fairly fine-grained bitmap it would need to be. Then every read an=
d resync
>>> checks the bitmap and ignores or returns 0 for discarded ranges, an=
d every
>>> write needs to check and if the range was discard, clear the bit an=
d write to
>>> the whole range.
>>>
>>> So: do-able, but definitely non-trivial.
>>>
>>
>> Wouldn't the sync/no-sync tracking you already have planned be usabl=
e
>> for tracking discarded areas? Or will that not be find-grained enou=
gh
>> for the purpose?
>
> That would be a necessary precursor to DISCARD support: yes.
> DISCARD would probably require a much finer grain than I would otherw=
ise
> suggest but I would design the feature to allow a range of granularit=
ies.
>

I suppose the big win for the sync/no-sync tracking is when initialisin=
g=20
an array - arrays that haven't been written don't need to be in sync.=20
But you will probably be best with a list of sync (or no-sync) areas fo=
r=20
that job, rather than a bitmap, as there won't be very many such blocks=
=20
(a few dozen, perhaps, for multiple partitions and filesystems like XFS=
=20
that write in different areas) and as the disk gets used, the "no-sync"=
=20
areas will decrease in size and number. For DISCARD, however, you'd ge=
t=20
no-sync areas scattered around the disk.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 04.07.2011 11:13:24 von Tom De Mulder

On Thu, 30 Jun 2011, Mikael Abrahamsson wrote:

> From the tests I have read, the Intel 510 are actually worse than the Intel
> X-25 G1/G2/320 models, with exactly the symptoms you're describing. It's fast
> for linear reads and writes, but not so good for random writes, especially not
> when it's getting full.

Yes; that's why I'm looking forward to also getting some SandForce 22xx
based drives (probably OCZ Vertex 3) to test.

> Include the Intel 320 as well, I think it should be viable for your usage
> pattern.

I wasn't too impressed by the Anandtech review of the 320, and (as
everywhere) my funds are limited. :-)


--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 04/07/2011 : The Moon is Waxing Crescent (17% of Full)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 04.07.2011 18:26:52 von Werner Fischer

Hi Tom,

1) regarding Software RAID and TRIM:
there is a script raid1ext4trim.sh-1.4 from Chris Caputo that does a
TRIM for Ext4 file systems on a software RAID 1. According to the
comments in the script it only supports RAID volumes which reside on
complete disks (e.g. /dev/sdb and /dev/sdc), not on RAID partitions
(e.g. /dev/sdb1 and /dev/sdc1)
The script is shipped with hdparm, get hdparm 9.37 at
http://sourceforge.net/projects/hdparm/ and you'll find the script in
the subfolder hdparm-9.37/wiper/contrib/
I have not tested the script yet, maybe I could do some tests tomorrow

2) regarding choosing the right SSD:
I would strongly recommend a SSD with integrated power-outage
protection, Intel's 320 series has this inside:
http://newsroom.intel.com/servlet/JiveServlet/download/38-43 24/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief. pdf
I have done some power-outage tests today, including a Vertex-3 and an
Intel 320 series. I used diskchecker.pl from
http://brad.livejournal.com/2116715.html

result:
-> for the Vertex 3 diskchecker.pl reported lost data:
[root@f15-ocz-vertex3 ~]# ./diskchecker.pl -s 10.10.30.199 verify testfile2
verifying: 0.00%
verifying: 1.42%
Error at page 52141, 0 seconds before end.
verifying: 6.31%
Error at page 83344, 0 seconds before end.
verifying: 11.12%
Error at page 163555, 0 seconds before end.
[...]
Total errors: 12
Histogram of seconds before end:
0 12
[root@f15-ocz-vertex3 ~]#
-> for the Intel 320 Series diskchecker.pl did not report data loss:
[root@f15-intel-320 ~]# ./diskchecker.pl -s 10.10.30.199 verify
testfile2
verifying: 0.00%
verifying: 0.12%
[...]
verifying: 99.82%
verifying: 100.00%
Total errors: 0
[root@f15-intel-320 ~]#
I did the tests multiple times, I had also some runs on the Vertex 3
without errors, but with the Intel 320 Series no single test reported an
error.

I did the tests with fedora 15 on the SSDs, here are the details of
hdparm -I

OCZ Vertex 3:
Model Number: OCZ-VERTEX3
Serial Number: OCZ-OQZF2I45DYZ47T3C
Firmware Revision: 2.06
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
[...]
device size with M = 1000*1000: 120034 MBytes (120 GB)

Intel 320 Series:
Model Number: INTEL SSDSA2CW160G3
Serial Number: CVPR112601AL160DGN
Firmware Revision: 4PC10302
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6
[...]
device size with M = 1000*1000: 160041 MBytes (160 GB)

Regards,
Werner

On Mon, 2011-07-04 at 10:13 +0100, Tom De Mulder wrote:
> On Thu, 30 Jun 2011, Mikael Abrahamsson wrote:
>
> > From the tests I have read, the Intel 510 are actually worse than the Intel
> > X-25 G1/G2/320 models, with exactly the symptoms you're describing. It's fast
> > for linear reads and writes, but not so good for random writes, especially not
> > when it's getting full.
>
> Yes; that's why I'm looking forward to also getting some SandForce 22xx
> based drives (probably OCZ Vertex 3) to test.
>
> > Include the Intel 320 as well, I think it should be viable for your usage
> > pattern.
>
> I wasn't too impressed by the Anandtech review of the 320, and (as
> everywhere) my funds are limited. :-)
>
>
> --
> Tom De Mulder - Cambridge University Computing Service
> +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
> -> 04/07/2011 : The Moon is Waxing Crescent (17% of Full)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 17.07.2011 23:52:04 von Lutz Vieweg

David Brown wrote:
> However, AFAIUI, you are wrong about TRIM being essential for the
> continued high performance of SSDs. As long as your SSDs have some
> over-provisioning (or you only partition something like 90% of the
> drive), and it's got good garbage collection, then TRIM will have
> minimal effect.

I beg to differ.

We are using SSDs in very much the way that Tom de Mulder intends,
and from our extensive performance measurements over many months
now I can say that (at least if you do have significant amounts
of write operations) it _does_ make a lot of difference whether you
periodically discard the unused sectors or not.
(For us, the write performance measured to be about half as good
when there are no free erase blocks available anymore.)

Of course, you can only benefit from discards if your filesystem
is not full (because then there is nothing to discard). But any
kind of "garbage collection" by the SSD itself will not have the
same effect, since it cannot know which blocks are in use by the
filesystem.

> I think other SSD-optimisations, such as those in BTRFS, are much more
> important.

Actually, (apart from btrfs still being in development, not really
ready for production use, yet), XFS (-o delaylog,barrier) performs
better on our SSDs than btrfs - without any SSD-specific options.

What is really an important factor for SSD performance: The controller.
The same SSDs perform with significantly lower latency for us when
connected to SATA controller channels than when connected to SAS
controllers (and they perform abysmal when used as hardware-RAID
constituents, in comparison).

Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 17.07.2011 23:57:12 von Lutz Vieweg

Tom De Mulder wrote:
> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM
> command

Well, for us voluntarily (cron-triggered) batch-discards have shown
to be the better option. If you leave it to the filesystem to
trigger the discards, then you might lose write performance
when you need it most.

In comparison, a voluntarily triggered discard in some low-usage time
is painless.

Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 00:00:40 von Lutz Vieweg

Tom De Mulder wrote:
> Maybe I should have specified--my particular aim is to try and use
> (fairly high-end) consumer SSDs for "enterprise" server applications

That's exactly what we do.
After all, "RAID" is still the acronym for "Redundant Array of
_Inexpensive_ Disks", no matter how many times big-$$$ will try to tell
you otherwise.

And a software RAID built from some cheap consumer SSDs easily
outperforms those overpriced "enterprise class" SSD devices they try to
sell you.


> Most hardware RAID controllers that I know
> of don't pass on the TRIM command

Not only that, they also suck regarding adding lots of latency to the
SSD communication.
There simply is no reason anymore to use hardware RAID at all.

Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 00:11:01 von Lutz Vieweg

NeilBrown wrote:
> Trim support for md is a long way down my list of interesting projects (and
> no-one else has volunteered).

That's a pity.

Actually, we were desperate enough about being able to discard unused
sectors from our SSDs "behind" MD that we implemented a user-space
work-around (using fallocate and BLKDISCARD ioctls after finding out
which physical devices are hidden behind the RAID), but that is awkward
in comparison to just using "fstrim" or alike, as this means that during
the discards, the filesystem appears "almost full", and the work-around
supports only RAID-1.

> It is not at all straight forward to implement.

For RAID5/6, I understand that. But supporting RAID 0/1, and maybe even
RAID 10, should not be that difficult. (dm-raid does support this,
though we don't like dm-raid too much for several other reasons.)

If today somebody is investing into SSDs, it is for speed. So if you are
setting up an SSD based RAID, it's unlikely that you'll aim for RAID5/6,
anyway.

> For copying RAID (RAID1, RAID10) you really need the same bitmap. There
> isn't the same risk of reading and trusting discarded parity, but a resync
> which didn't know about discarded ranges would undo the discard for you.

That is true, but not really a problem. Yes, the write-performance will
suffer until the next "fstrim" is done, but the performance suffers from
the resync anyway, so that's not something extra, and SSD users will
certainly issue "fstrim" periodically, anyway.

I guess you would make many people happy if MD-raid supported passing
through discards, even if it was only for RAID 0/1, and even if a resync
meant you'd have to issue an additional "fstrim".

Regards,

Lutz Vieweg



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 00:16:12 von Lutz Vieweg

Tom De Mulder wrote:
> I have a set of 4 Intel 510 SSDs purely for testing, and I have used
> these to simulate the kinds of workload I would expect them to
> experience in a server environment

Beware: The Intel SSDs are documented to voluntarily throttle write
speed if they detect a lot of writing going on to meet their lifetime
advertisement.

(I have not read such in the documentation of
marvell/micron/indilinx/sandforce controllers, and indeed, when wiped
once per week, our SSDs keep up their initial performance. And yes, I
find it acceptable that they might wear out after >= 3 years :-)

Regards,

Lutz Vieweg


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 00:31:43 von Lutz Vieweg

Werner Fischer wrote:
> 1) regarding Software RAID and TRIM:
> there is a script raid1ext4trim.sh-1.4 from Chris Caputo that does a
> TRIM for Ext4 file systems on a software RAID 1. According to the
> comments in the script it only supports RAID volumes which reside on
> complete disks (e.g. /dev/sdb and /dev/sdc), not on RAID partitions
> (e.g. /dev/sdb1 and /dev/sdc1)
> The script is shipped with hdparm

I wonder why people would use the "hdparm" tool to issue TRIM commands
on a lower level that you can do much more portable by using ioctl
BLKDISCARD...

> I would strongly recommend a SSD with integrated power-outage
> protection

Your results seem to indicate differences, but how is that an evidence
for SSDs corrupting filesystems? As long as the SSD actually tells the
truth about draining its caches when asked to, the journaling of the
filesystem will keep the meta-data intact - but not necessarily the data
inside the files, - for very plausible performance reasons, most
filesystems will _not_ try to sync non-meta data by default!

Nevertheless, sensitivity against power-outage situations has been a
subject of many SSD updates for different controllers, so there may have
been real issues, too.

Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 07:14:43 von Mikael Abrahamsson

On Sun, 17 Jul 2011, Lutz Vieweg wrote:

> David Brown wrote:
>> However, AFAIUI, you are wrong about TRIM being essential for the continued
>> high performance of SSDs. As long as your SSDs have some over-provisioning
>> (or you only partition something like 90% of the drive), and it's got good
>> garbage collection, then TRIM will have minimal effect.
>
> I beg to differ.
>
> Of course, you can only benefit from discards if your filesystem
> is not full (because then there is nothing to discard). But any
> kind of "garbage collection" by the SSD itself will not have the
> same effect, since it cannot know which blocks are in use by the
> filesystem.

Well, that's what you gain from only using 90% of the drive space for data
(be it via partition or some other means), you increase the
overprovisioning and thus the drive has more empty space to play with,
even if you fill up the FS to 100%.

So yes, TRIM is nice but if you want consistant performance then you need
to assume that your FS is going to be 100% full anyway, so then you have
to limit the FS block use to 80-90% of the total drive space.

--
Mikael Abrahamsson email: swmike@swm.pp.se
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 12:35:42 von David Brown

On 17/07/2011 23:52, Lutz Vieweg wrote:
> David Brown wrote:
>> However, AFAIUI, you are wrong about TRIM being essential for the
>> continued high performance of SSDs. As long as your SSDs have some
>> over-provisioning (or you only partition something like 90% of the
>> drive), and it's got good garbage collection, then TRIM will have
>> minimal effect.
>
> I beg to differ.
>

Well, I don't have your experience here (I have a couple of 60G SSD's in
RAID0, without TRIM, but that's hardly in the same class). So I don't
expect you to put much weight on my opinions. But maybe it will give
you reason for more testing.

> We are using SSDs in very much the way that Tom de Mulder intends,
> and from our extensive performance measurements over many months
> now I can say that (at least if you do have significant amounts
> of write operations) it _does_ make a lot of difference whether you
> periodically discard the unused sectors or not.
> (For us, the write performance measured to be about half as good
> when there are no free erase blocks available anymore.)
>

If there are no free erase blocks, then your SSD's don't have enough
over-provisioning. This is, after all, the whole point of having more
physical flash than the logical disk size would suggest. Depending on
the quality of the SSD (more expensive ones have more
over-provisioning), and the usage patterns (if you have lots of small
random writes, you'll need more extra space), then you might have to
"manually" over-provision the disk by only partitioning about 90% of the
disk. Of course, you must make sure that the remaining 10% is
"discarded", or left untouched from new, and that you use the partition
for your RAID and not the whole disk.

So now you have plenty of erase blocks at any time, and your write
performance will be good.


TRIM, on the other hand, does not give you any extra free erase blocks.
If you think it does, you've misunderstood it.

TRIM exists to make garbage collection a little more efficient - when
garbage collecting an erase block that contains TRIM'ed blocks, the
TRIM'ed blocks don't need to be copied. This saves a small amount of
time in the copying, and allows slightly denser packing. It may
sometimes lead to saving whole erase blocks, but that's seldom the case
in practice except when erasing large files.

If your disks are reasonably full, then TRIM will not help much because
the garbage collection will be desperately trying to piece together
small bits into complete erase blocks, and your performance will drop
through the floor. If you have plenty of overprovisioning, then the SSD
still has lots of completely free erase blocks whenever it needs them.

If your filesystem re-uses (logical) blocks, then TRIM will not help.
It is /always/ more efficient for the FS to simply write new data to the
same block, rather than TRIM'ing it first.

TRIM is a very expensive command - it acts a bit like a write, but it is
not a queued command. Thus the block layer must wait for /all/ IO
commands to have completed, then issue the TRIM, then wait for it to
complete, and then carry on with new commands. On some SSD's, it will
(according to something I read) trigger garbage collection, which may
slow down the SSD. Even without that, the performance of most meta-data
operations (such as delete) will drop considerably when they also need
to do TRIM.








On the other hand, your off-line batch TRIM during low use periods could
well be a win. The cost of these discards is not going to be an issue,
and large batched discards are going to be far more useful to the SSD
than small scattered ones. I believe that there has been work on a
similar system in XFS - I don't know what happened to that, or if there
is any way to make it work in concert with md raid.


What will make a big difference to using SSD's in md raid is the
sync/no-sync tracking. This will avoid a lot of unnecessary writes,
especially with a new array, and leave the SSD with more free blocks (at
least until the disk is getting full of data). It is also much higher
up the things-to-do list, because it will be useful for all uses of md
raid, and is a perquisite to general discard support. (Strictly
speaking it is not needed for SSD's that guarantee a zero return on
TRIM'ed blocks - but only some SSD's give that guarantee.)


> Of course, you can only benefit from discards if your filesystem
> is not full (because then there is nothing to discard). But any
> kind of "garbage collection" by the SSD itself will not have the
> same effect, since it cannot know which blocks are in use by the
> filesystem.
>

Garbage collection will recycle blocks that have been overwritten. The
filesystem knows which logical blocks are in use, and which are free.
Filesystems already heavily re-use blocks, in the aim of preferring
faster outer tracks on HD's, and minimizing head movement. So when a
file is erased, there's a good chance that those same logical blocks
will be re-used soon - TRIM is of no benefit in that case.

>> I think other SSD-optimisations, such as those in BTRFS, are much more
>> important.
>
> Actually, (apart from btrfs still being in development, not really
> ready for production use, yet), XFS (-o delaylog,barrier) performs
> better on our SSDs than btrfs - without any SSD-specific options.
>

btrfs is ready for some uses, but is not mature and real-world tested
enough for serious systems (and its tools are still lacking somewhat).
But more generally, different filesystems are faster and slower for
different usage patterns.

One SSD optimisation that many filesystems could implement is to be less
concerned about fragmentation. Most modern filesystems go out of their
way to try to reduce fragmentation, which is great for HD use. But on
SSD's, you should be happy to fragment files if it promotes re-use of
erased blocks, as long as fragments aim to fill complete erase blocks
(in size and alignment).


> What is really an important factor for SSD performance: The controller.
> The same SSDs perform with significantly lower latency for us when
> connected to SATA controller channels than when connected to SAS
> controllers (and they perform abysmal when used as hardware-RAID
> constituents, in comparison).

That is /very/ interesting to know, and is a data point I haven't read
elsewhere (though I knew about poor performance of hardware RAID with
SSD). Thanks for sharing that.


>
> Regards,
>
> Lutz Vieweg
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 12:48:49 von Tom De Mulder

On Mon, 18 Jul 2011, David Brown wrote:

First, I'd like to say that I've done more testing, and found that even
after very prolonged, sustained heavy use, the (Intel 510) SSDs I
partitioned 50/50 with half left unused didn't show any degradation in
performance. That's after about a week of constant writing/erasing.

> If your disks are reasonably full, then TRIM will not help much because the
> garbage collection will be desperately trying to piece together small bits
> into complete erase blocks, and your performance will drop through the floor.

However, it won't drop as low as it would without TRIM in the same
situation. But with a continuous heavy workload, even TRIM won't help, and
over-provisioning is the way to go.

Best,

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 18/07/2011 : The Moon is Waning Gibbous (83% of Full)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 12:53:59 von Tom De Mulder

On Sun, 17 Jul 2011, Lutz Vieweg wrote:

> What is really an important factor for SSD performance: The controller.
> The same SSDs perform with significantly lower latency for us when
> connected to SATA controller channels than when connected to SAS
> controllers (and they perform abysmal when used as hardware-RAID
> constituents, in comparison).

Interesting.

I think it depends a lot on the controller. On a Dell server with PERC5/i
RAID controller (actually made by LSI) I saw some performance degradation
but not enough that I'd consider it a deal-breaker for situations where I
really cared about the RAID functionality, more than about the loss of
performance. After all, the latency is still massively lower than it is
with spinning disk.

I have a really great Areca RAID controller in a different server, but
unfortunately it's in use and it'll be a while before I get another one I
can use for testing. Given how well it does in other respects, I have high
hopes for it.


Best,

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 18/07/2011 : The Moon is Waning Gibbous (83% of Full)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 14:13:35 von Werner Fischer

On Mon, 2011-07-18 at 11:53 +0100, Tom De Mulder wrote:
> On Sun, 17 Jul 2011, Lutz Vieweg wrote:
>
> > What is really an important factor for SSD performance: The controller.
> > The same SSDs perform with significantly lower latency for us when
> > connected to SATA controller channels than when connected to SAS
> > controllers (and they perform abysmal when used as hardware-RAID
> > constituents, in comparison).
>
> Interesting.
>
> I think it depends a lot on the controller. On a Dell server with PERC5/i
> RAID controller (actually made by LSI) I saw some performance degradation
> but not enough that I'd consider it a deal-breaker for situations where I
> really cared about the RAID functionality, more than about the loss of
> performance. After all, the latency is still massively lower than it is
> with spinning disk.
>
> I have a really great Areca RAID controller in a different server, but
> unfortunately it's in use and it'll be a while before I get another one I
> can use for testing. Given how well it does in other respects, I have high
> hopes for it.

I agree that the controller can influence performance:
1. SATA controller: direct communication
2. SAS controller: Serial ATA Tunneling Protocol (STP) is used,
this can have an impact on performance
3. Hardware RAID controller: depending on the controller, performance
impact can be from low to very high

Regards,
Werner

>
>
> Best,
>
> --
> Tom De Mulder - Cambridge University Computing Service
> +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
> -> 18/07/2011 : The Moon is Waning Gibbous (83% of Full)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
: Werner Fischer
: Technology Specialist
: Thomas-Krenn.AG | The server-experts
: http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 20:09:31 von Lutz Vieweg

On 07/18/2011 12:35 PM, David Brown wrote:
> If there are no free erase blocks, then your SSD's don't have enough over-provisioning.

When you think about "How many free erase blocks are enough?" you'll come to the conclusion that
this simply depends on the usage pattern.

Ideally, you'll want every write to a SSD to go to a completely free erase block, because if it
doesn't, it's both slower and will probably also lead to a higher average number of write cycles
(because more than one read-modify-write cycle per erase block may be required to fill it with new
data, if that new data cannot be buffered in the SSDs RAM.)

If the goal is to have every write go to a free erase block, then you need to free up at least as
many erase blocks per time period as data will be written during that time period (assuming the
worst case that all writes will _not_ go to blocks that have been written to before).

Of course you can accomplish this by over-providing so much flash space that the SSD will always be
capable of re-arranging the used data blocks such that they are tightly packed into fully used erase
blocks, while the rest of the erase blocks are completely empty.
But that is a pretty expensive approach, essentially this requires 100% over-provisioing (or: 50
usable capacity, or twice the price for the storage).
And, you still have to trust that the SSD will use that over-provisioned space the way you want
(e.g. the SSD firmware could be inclined to only re-arrange erase blocks that have a certain ratio
of unused sectors within them).

One good thing abort explicitely discarding sectors, while using most of the offered space is
(besides the significant cost argument) that your SSD will likely invest effort to re-arrange
sectors into fully allocated and fully free erase blocks exactly at the time when this makes most
sense for you. It will have to copy only data that is actually still valid (reducing wear), and you
may even choose a time at which you know that significant amounts of data have been deleted.

> Depending on the quality of the SSD (more expensive ones have more over-provisioning)

Alas, manufacturers tend to ask twice the price for much less than twice the over-provisioning,
so it's still advisable to buy the cheaper SSD and choose over-provisioning ratio by using
only part of it...


> TRIM, on the other hand, does not give you any extra free erase blocks. If you think it does, you've
> misunderstood it.

I have to disagree on this :-)

Imagine a SSD with 10 erase blocks capacity, each having place for 10 sectors.
Let's assume the SSD advertises only 90 sectors total capacity, over-providing one erase block.
Now I write 8 files each of 10 sectors size on the SSDs, then delete 2 of the 8 files.

If the SSD now performs some "garbage collection", it will not have more than 2 free erase blocks.

But if I discard/TRIM the unused sectors, and the SSD does the right thing about it, there will be 4
free erase blocks.

So, yes, TRIM can gain you extra free erase blocks, but of course only if there is unused space in
the filesystem.


> It may sometimes lead to saving
> whole erase blocks, but that's seldom the case in practice except when erasing large files.

Our different perception may result from our use-case involving frequent deletion of files, while
yours doesn't.

But this is not only about "large files", only. Obviously, all modern SSDs are capable of
re-arranging data into fully allocated and fully free erase-blocks, and this process can benefit
from every single sector that has been discarded.


> If your filesystem re-uses (logical) blocks, then TRIM will not help.

If the only thing the filesystem does is overwriting blocks that held valid data right until they
are overwritten with newer valid data, then TRIM will certainly not help.

But every discard that happens in between an invalidation of data and the overwriting of the same
logical block can potentially benefit from a TRIM in between. Imagine a file of 1000 sectors, all
valid data. Now your application decides to overwrite that file with 1000 sectors of newer data.
Let's assume the FS is clever enough to use the same 1000 logical sectors for this. But let's also
assume the RAM-cache of the SSD is only 20 logical sectors in size, and one erase-block is 10
sectors in size. Now the SSD needs to start writing from its RAM buffer to flash at least after 20
sectors of data have been processed. If you are lucky, and everything was written in sequence, and
well aligned, then the SSD may just need to erase and overwrite flash blocks that were formerly used
for the same logical sectors. But if you are unlucky, the logical sectors to write are spread across
different flash erase blocks. Thus the SSD can at best only mark them "unused" and has to write the
data to a different (hopefully completely free) erase block. Again, if lucky (or heavily
over-provisioned), you had >= 100 free erase blocks available when you started writing, and after
they were written, 100 other erase blocks that held the older data can be freed after all 1000
sectors have been written. But if you are unlucky, not that many free erase blocks were available
when starting to write. Then, to write the new data, the SSD needs to read data from
non-completely-free erase blocks, fill the unused sectors within them with the new data, and write
back the erase-blocks - which means much lower performance, and more wear.
Now the same procedure with a "TRIM": After laying out the logical sectors to write to (but before
writing to them), the filesystem can issue a "discard" on all those sectors. This will enable the
SSD to mark all 100 erase blocks as completely free - even without additional "re-arranging". The
following write operation to 1000 sectors may require erase-before write (if no pre-existing
completely free erase-blocks can be used), but that is much better than having to do
"read-modify-erase-write" cycles to the flash (and a larger number of that, since data has to be
copied that the SSD cannot know to be obsolete).

So: While re-arranging of valid data into erase-blocks may be expensive enough to do it only
"batched" from time to time, even the simple marking of sectors as discarded can help the
performance and endurance of a SSD.

> It is /always/ more efficient
> for the FS to simply write new data to the same block, rather than TRIM'ing it first.

Depends on how expensive the marking of sectors as free is for the SSD, and how likely newly written
data that fits into the SSDs cache will cause the freeing of complete erase blocks.


> TRIM is a very expensive command

That seems to depend a lot on the firmware of different drives.
But I agree that it might not be a good idea to rely on it being cheap.

From the behaviour of the SSDs we like best it seems that TRIM is often only causing cheap "marking
as free" operations, while sometimes, every few weeks, the SSD is actually doing a lot of
re-arranging ("garbage collecting"?) stuff after the discards have been issued.
(Certainly also depends a lot on the usage pattern.)

> I believe that there has been work on a similar system
> in XFS

Yes, XFS supports that now, but alas, we cannot use it with MD, as MD will discard the discards :-)


> What will make a big difference to using SSD's in md raid is the sync/no-sync tracking. This will
> avoid a lot of unnecessary writes, especially with a new array, and leave the SSD with more free
> blocks (at least until the disk is getting full of data).

Hmmm... the sync/no-sync tracking will save you exactly one write to all sectors. That's certainly a
good thing, but since a single "fstrim" after the sync will restore the "good performance"
situation, I don't consider that an urgent feature.


> Filesystems already heavily re-use blocks, in the aim
> of preferring faster outer tracks on HD's, and minimizing head movement. So when a file is erased,
> there's a good chance that those same logical blocks will be re-used soon - TRIM is of no benefit in
> that case.

It is of benefit - to the performance of exactly those writes that go to the formerly used logical
blocks.


> btrfs is ready for some uses, but is not mature and real-world tested enough for serious systems
> (and its tools are still lacking somewhat).

Let's not divert the discussion too much. I'll happily re-try btrfs when the developers say it's not
experimental anymore, and when there's a "fsck"-like utility to check its integrity.

Regards,

Lutz Vieweg



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 18.07.2011 22:18:54 von David Brown

On 18/07/11 20:09, Lutz Vieweg wrote:
> On 07/18/2011 12:35 PM, David Brown wrote:
>> If there are no free erase blocks, then your SSD's don't have enough
>> over-provisioning.
>
> When you think about "How many free erase blocks are enough?" you'll
> come to the conclusion that this simply depends on the usage pattern.
>

Yes.

> Ideally, you'll want every write to a SSD to go to a completely free
> erase block, because if it doesn't, it's both slower and will probably
> also lead to a higher average number of write cycles (because more than
> one read-modify-write cycle per erase block may be required to fill it
> with new data, if that new data cannot be buffered in the SSDs RAM.)
>

No.

You don't need to fill an erase block for writing - writes are done as
write blocks (I think 4K is the norm). That's the odd thing about flash
- erase is done in much larger blocks than writes.

> If the goal is to have every write go to a free erase block, then you
> need to free up at least as many erase blocks per time period as data
> will be written during that time period (assuming the worst case that
> all writes will _not_ go to blocks that have been written to before).
>

Again, no - since you don't have to write to whole erase blocks.

> Of course you can accomplish this by over-providing so much flash space
> that the SSD will always be capable of re-arranging the used data blocks
> such that they are tightly packed into fully used erase blocks, while
> the rest of the erase blocks are completely empty.
> But that is a pretty expensive approach, essentially this requires 100%
> over-provisioing (or: 50 usable capacity, or twice the price for the
> storage).

The level of over-provisioning that can be useful will depend on the
usage patterns, such as how much and how scattered your deletes are.
There will be diminishing returns for increased overprovisioning - the
balance is up to the user, but I can't imagine 50% being sensible.

I wonder if you are mixing up the theoretical peak write speeds to a new
SSD with real-world write speeds to a disk in use. These are not the
same, and no amount of TRIM'ing or over-provisioning will let you see
those speeds in anything but a synthetic benchmark. Your aim is /not/
to go mad trying to reach the marketing-claimed speeds in a real
application, but to balance /good/ and /consistent/ speeds with a
sensible cost. Understand that SSD's are very fast, but not as fast as
a marketer or an initial benchmark suggests, and you will be much
happier with your disks.

> And, you still have to trust that the SSD will use that over-provisioned
> space the way you want (e.g. the SSD firmware could be inclined to only
> re-arrange erase blocks that have a certain ratio of unused sectors
> within them).
>

You want to pick an SSD with good garbage collection, if that's what you
mean.


> One good thing abort explicitely discarding sectors, while using most of
> the offered space is (besides the significant cost argument) that your
> SSD will likely invest effort to re-arrange sectors into fully allocated
> and fully free erase blocks exactly at the time when this makes most
> sense for you. It will have to copy only data that is actually still
> valid (reducing wear), and you may even choose a time at which you know
> that significant amounts of data have been deleted.
>

The reality is that for most applications and usage patterns, logical
blocks that are deleted and not re-used are in the minority. It is true
that when garbage-collecting a block, the SSD can hop over the discarded
blocks. But since they are in the minority, it's a small effect. It
could even be a detrimental effect - it could encourage the SSD to
garbage-collect a block that would otherwise be left untouched, leading
to extra effort and wear (but giving you a little more free space). Any
effort done by the SSD on TRIM'ed blocks is wasted if these (logical)
blocks are overwritten by the filesystem later, except if the SSD was
otherwise short on free blocks.

Again, the use of explicit batch discards gives a better effect than
automatic TRIMs on deletes.

>> Depending on the quality of the SSD (more expensive ones have more
>> over-provisioning)
>
> Alas, manufacturers tend to ask twice the price for much less than twice
> the over-provisioning,
> so it's still advisable to buy the cheaper SSD and choose
> over-provisioning ratio by using
> only part of it...
>

Fair enough.

>
>> TRIM, on the other hand, does not give you any extra free erase
>> blocks. If you think it does, you've
>> misunderstood it.
>
> I have to disagree on this :-)
>
> Imagine a SSD with 10 erase blocks capacity, each having place for 10
> sectors.
> Let's assume the SSD advertises only 90 sectors total capacity,
> over-providing one erase block.
> Now I write 8 files each of 10 sectors size on the SSDs, then delete 2
> of the 8 files.
>
> If the SSD now performs some "garbage collection", it will not have more
> than 2 free erase blocks.
>
> But if I discard/TRIM the unused sectors, and the SSD does the right
> thing about it, there will be 4 free erase blocks.
>
> So, yes, TRIM can gain you extra free erase blocks, but of course only
> if there is unused space in the filesystem.
>

OK, let me rephrase - TRIM does not give you /significantly/ more free
erase blocks /in real life/. You can construct arrangements, like you
described, where the SSD can get noticeably more erase blocks through
the use of TRIM. But under use, things are different as blocks are
written and re-written. Your example would break as soon as you take
into account the writing of the directory to the disk, messing up your
neat blocks.

And again, appropriately scheduled batch TRIM will give better results
than automatic TRIM, and /may/ be worth the effort.

>
>> It may sometimes lead to saving
>> whole erase blocks, but that's seldom the case in practice except when
>> erasing large files.
>
> Our different perception may result from our use-case involving frequent
> deletion of files, while yours doesn't.
>

Perhaps. The nature of most filesystems is to grow - more data gets
written than erased. But many of the effects here are usage pattern
dependent.

> But this is not only about "large files", only. Obviously, all modern
> SSDs are capable of re-arranging data into fully allocated and fully
> free erase-blocks, and this process can benefit from every single sector
> that has been discarded.
>
>
>> If your filesystem re-uses (logical) blocks, then TRIM will not help.
>
> If the only thing the filesystem does is overwriting blocks that held
> valid data right until they are overwritten with newer valid data, then
> TRIM will certainly not help.
>
> But every discard that happens in between an invalidation of data and
> the overwriting of the same logical block can potentially benefit from a
> TRIM in between. Imagine a file of 1000 sectors, all valid data. Now
> your application decides to overwrite that file with 1000 sectors of
> newer data. Let's assume the FS is clever enough to use the same 1000
> logical sectors for this. But let's also assume the RAM-cache of the SSD
> is only 20 logical sectors in size, and one erase-block is 10
> sectors in size. Now the SSD needs to start writing from its RAM buffer
> to flash at least after 20 sectors of data have been processed. If you
> are lucky, and everything was written in sequence, and well aligned,
> then the SSD may just need to erase and overwrite flash blocks that were
> formerly used for the same logical sectors. But if you are unlucky, the
> logical sectors to write are spread across different flash erase blocks.
> Thus the SSD can at best only mark them "unused" and has to write the
> data to a different (hopefully completely free) erase block. Again, if
> lucky (or heavily over-provisioned), you had >= 100 free erase blocks
> available when you started writing, and after they were written, 100
> other erase blocks that held the older data can be freed after all 1000
> sectors have been written. But if you are unlucky, not that many free
> erase blocks were available when starting to write. Then, to write the
> new data, the SSD needs to read data from non-completely-free erase
> blocks, fill the unused sectors within them with the new data, and write
> back the erase-blocks - which means much lower performance, and more wear.
> Now the same procedure with a "TRIM": After laying out the logical
> sectors to write to (but before writing to them), the filesystem can
> issue a "discard" on all those sectors. This will enable the SSD to mark
> all 100 erase blocks as completely free - even without additional
> "re-arranging". The following write operation to 1000 sectors may
> require erase-before write (if no pre-existing completely free
> erase-blocks can be used), but that is much better than having to do
> "read-modify-erase-write" cycles to the flash (and a larger number of
> that, since data has to be copied that the SSD cannot know to be obsolete).
>
> So: While re-arranging of valid data into erase-blocks may be expensive
> enough to do it only "batched" from time to time, even the simple
> marking of sectors as discarded can help the performance and endurance
> of a SSD.
>

Again, I think your arguments only work on very artificial data. But
perhaps this is close to your real-world usage patterns.

>> It is /always/ more efficient
>> for the FS to simply write new data to the same block, rather than
>> TRIM'ing it first.
>
> Depends on how expensive the marking of sectors as free is for the SSD,
> and how likely newly written data that fits into the SSDs cache will
> cause the freeing of complete erase blocks.
>
>
>> TRIM is a very expensive command
>
> That seems to depend a lot on the firmware of different drives.
> But I agree that it might not be a good idea to rely on it being cheap.
>
> From the behaviour of the SSDs we like best it seems that TRIM is often
> only causing cheap "marking as free" operations, while sometimes, every
> few weeks, the SSD is actually doing a lot of re-arranging ("garbage
> collecting"?) stuff after the discards have been issued.
> (Certainly also depends a lot on the usage pattern.)
>

My main point about TRIM being expensive is the effect it has on the
block IO queue, regardless of the implementation in the SSD. Again,
this is less relevant to batched TRIMs during low-use times.

>> I believe that there has been work on a similar system
>> in XFS
>
> Yes, XFS supports that now, but alas, we cannot use it with MD, as MD
> will discard the discards :-)
>
>
>> What will make a big difference to using SSD's in md raid is the
>> sync/no-sync tracking. This will
>> avoid a lot of unnecessary writes, especially with a new array, and
>> leave the SSD with more free
>> blocks (at least until the disk is getting full of data).
>
> Hmmm... the sync/no-sync tracking will save you exactly one write to all
> sectors. That's certainly a good thing, but since a single "fstrim"
> after the sync will restore the "good performance" situation, I don't
> consider that an urgent feature.
>

I really hope your SSD's return zeros for TRIM'ed blocks, and that you
are sure all your TRIMs are in full raid stripes - otherwise you will
/seriously/ mess up your raid arrays.

One definite problem with RAID on SSD's is that this first write will
mean that the SSD has no more free erase blocks than if the filesystem
were full, as the SSD doesn't know the blocks can be recycled. Of
course, it will see that pretty quickly as soon as the filesystem writes
real data, but it will still have extra waste. For mirrored drives,
this may mean a difference in speed in the two drives as one has more
freedom for garbage collection than the other (for RAID5, this effect is
spread evenly over the disks).

>
>> Filesystems already heavily re-use blocks, in the aim
>> of preferring faster outer tracks on HD's, and minimizing head
>> movement. So when a file is erased,
>> there's a good chance that those same logical blocks will be re-used
>> soon - TRIM is of no benefit in
>> that case.
>
> It is of benefit - to the performance of exactly those writes that go to
> the formerly used logical blocks.
>
>
>> btrfs is ready for some uses, but is not mature and real-world tested
>> enough for serious systems
>> (and its tools are still lacking somewhat).
>
> Let's not divert the discussion too much. I'll happily re-try btrfs when
> the developers say it's not experimental anymore, and when there's a
> "fsck"-like utility to check its integrity.
>
> Regards,
>
> Lutz Vieweg
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 19.07.2011 11:29:20 von Lutz Vieweg

On 07/18/2011 10:18 PM, David Brown wrote:
> You don't need to fill an erase block for writing - writes are done as write blocks (I think 4K is
> the norm).

You are right on that. Those sectors in a partially used
erase block that have not been written to since the last erase of the
whole erase block can be written to as good as sectors in completely
empty erase blocks.


> My main point about TRIM being expensive is the effect it has on the block IO queue, regardless of
> the implementation in the SSD.

Because of those effects on the block-IO-queue, the user-space work-around
we implemented to discard the SSDs our RAID-1s consist of will not discard
"one area on all SSDs at a time", but rather iterate first through all
unused areas on one SSD, then iterate through the same list of areas on the
second SSD.

The effect of this is very much to our liking: While we can see
near-100%-utilization on one SSD at a time during the discards,
the other SSD will happily service the readers, and even the writes that
go to the /dev/md* device are buffered in main memory long enough that we
do not really see a significantly bad impact on the service.
(This might be different, though, if the discards were done
during peak-write-load times of the day.)


> I really hope your SSD's return zeros for TRIM'ed blocks

For RAID-1, the only consequence of not doing so is just that "data-check" runs may result
in a > 0 mismatch_cnt. It does not destroy any of your data, and as long as I have
two SSDs in a RAID, both of which give a non-error result when reading a sector, I would
have no indication of "which of the returned sector contents to prefer", anyway.

(I admit that for health monitoring it is useful to have a meaningful mismatch_cnt.)

> and that you are sure all your TRIMs are
> in full raid stripes - otherwise you will /seriously/ mess up your raid arrays.

Again, for RAID0/1 (even 10) I don't see why this would harm any data.

Regards,

Lutz Vieweg


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 19.07.2011 12:22:35 von David Brown

On 19/07/2011 11:29, Lutz Vieweg wrote:
> On 07/18/2011 10:18 PM, David Brown wrote:
>> You don't need to fill an erase block for writing - writes are done
>> as write blocks (I think 4K is the norm).
>
> You are right on that. Those sectors in a partially used erase block
> that have not been written to since the last erase of the whole erase
> block can be written to as good as sectors in completely empty erase
> blocks.
>
>
>> My main point about TRIM being expensive is the effect it has on
>> the block IO queue, regardless of the implementation in the SSD.
>
> Because of those effects on the block-IO-queue, the user-space
> work-around we implemented to discard the SSDs our RAID-1s consist of
> will not discard "one area on all SSDs at a time", but rather iterate
> first through all unused areas on one SSD, then iterate through the
> same list of areas on the second SSD.
>

Do you take the arrays off-line during this process, or at least make
them read-only? If not, how do you ensure that the lists are valid?

> The effect of this is very much to our liking: While we can see
> near-100%-utilization on one SSD at a time during the discards, the
> other SSD will happily service the readers, and even the writes that
> go to the /dev/md* device are buffered in main memory long enough
> that we do not really see a significantly bad impact on the service.
> (This might be different, though, if the discards were done during
> peak-write-load times of the day.)
>
>
>> I really hope your SSD's return zeros for TRIM'ed blocks
>
> For RAID-1, the only consequence of not doing so is just that
> "data-check" runs may result in a > 0 mismatch_cnt. It does not
> destroy any of your data, and as long as I have two SSDs in a RAID,
> both of which give a non-error result when reading a sector, I would
> have no indication of "which of the returned sector contents to
> prefer", anyway.
>
> (I admit that for health monitoring it is useful to have a meaningful
> mismatch_cnt.)
>
>> and that you are sure all your TRIMs are in full raid stripes -
>> otherwise you will /seriously/ mess up your raid arrays.
>
> Again, for RAID0/1 (even 10) I don't see why this would harm any
> data.
>

Fair enough for RAID1. Just don't try it with RAID5!


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 19.07.2011 15:41:53 von Lutz Vieweg

On 07/19/2011 12:22 PM, David Brown wrote:
>> Because of those effects on the block-IO-queue, the user-space
>> work-around we implemented to discard the SSDs our RAID-1s consist of
>> will not discard "one area on all SSDs at a time", but rather iterate
>> first through all unused areas on one SSD, then iterate through the
>> same list of areas on the second SSD.
>
> Do you take the arrays off-line during this process, or at least make
> them read-only?

No, we keep them online and writeable.

> If not, how do you ensure that the lists are valid?

The discard procedure works by..:

- use SYS_fallocate to allocate the free space on the device (minus
some safety margin for the writes that will happen during the procedure)
for a temporary file (notice that with fallocate on XFS, you can
allocate space for a file without actually ever writing to it)

- use ioctl FIEMAP to get a list of the logical blocks that were
allocated

- use ioctl BLKDISCARD to discard these blocks

- remove the temporary file

Since the blocks to discard are allocated for the temporary
file during the procedure, they will not be used otherwise.

Obviously, we would still prefer using "fstrim", because then
there would be no need for that temporary file, the "safety margin"
and a temporary high fill level of the filesystem.

Regards,

Lutz Vieweg


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 19.07.2011 16:19:06 von Tom De Mulder

In case people are interested, I ran more benchmarks. The impact of TRIM
on an over-provisioned drive is remarkable: a 25% performance loss when
using Postmark.

Because this isn't really on-topic for the MD mailing list, I've put it
somewhere else:

http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive -benchmarks/

My next goal, when I have the time, is to compare different amounts of
over-provisioning.

--
Tom De Mulder - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 19/07/2011 : The Moon is Waning Gibbous (75% of Full)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 19.07.2011 17:06:12 von David Brown

On 19/07/2011 15:41, Lutz Vieweg wrote:
> On 07/19/2011 12:22 PM, David Brown wrote:
>>> Because of those effects on the block-IO-queue, the user-space
>>> work-around we implemented to discard the SSDs our RAID-1s consist of
>>> will not discard "one area on all SSDs at a time", but rather iterate
>>> first through all unused areas on one SSD, then iterate through the
>>> same list of areas on the second SSD.
>>
>> Do you take the arrays off-line during this process, or at least make
>> them read-only?
>
> No, we keep them online and writeable.
>
>> If not, how do you ensure that the lists are valid?
>
> The discard procedure works by..:
>
> - use SYS_fallocate to allocate the free space on the device (minus
> some safety margin for the writes that will happen during the procedure)
> for a temporary file (notice that with fallocate on XFS, you can
> allocate space for a file without actually ever writing to it)
>
> - use ioctl FIEMAP to get a list of the logical blocks that were
> allocated
>
> - use ioctl BLKDISCARD to discard these blocks
>
> - remove the temporary file
>
> Since the blocks to discard are allocated for the temporary
> file during the procedure, they will not be used otherwise.
>
> Obviously, we would still prefer using "fstrim", because then
> there would be no need for that temporary file, the "safety margin"
> and a temporary high fill level of the filesystem.
>
> Regards,
>
> Lutz Vieweg
>

It certainly sounds like a safe procedure, but I can see why you feel
it's not quite as elegant as it could be. You will also be "discarding"
blocks that have never been written (at least, not since the last
discard...) - is there much overhead in that?


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 20.07.2011 09:42:55 von David Brown

On 19/07/2011 16:19, Tom De Mulder wrote:
>
> In case people are interested, I ran more benchmarks. The impact of TRIM
> on an over-provisioned drive is remarkable: a 25% performance loss when
> using Postmark.
>
> Because this isn't really on-topic for the MD mailing list, I've put it
> somewhere else:
>

It is a little off-topic, perhaps, but still of interest to many RAID
users precisely because of the myths and inaccurate data surrounding
TRIM. There are too many people that think TRIM is essential to SSD's,
RAID doesn't support TRIM, therefore you shouldn't use RAID and SSD's
together.

> http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive -benchmarks/
>

To try to explain your results - first it's easy to see why md raid1
with discard is a little slower than md raid1 without discard - the raid
layer ignores the discards, so they can't help or hinder much, and the
filesystem is doing a bit of extra work (sending the discards) to no
purpose.

It is also easy to see why a single SSD with no discards is about the
same speed. You are using RAID1 - reads and writes are not striped in
any way, so the speed is the same as for a single disk. If the test
accessed multiple files in parallel (especially reads), you'd see faster
reads.

The telling figure here, though, is that TRIM made the single drive
significantly slower.

> My next goal, when I have the time, is to compare different amounts of
> over-provisioning.
>

Also try using RAID10,far for your arrays. That will work the SSD's
harder, and perhaps give a better comparison.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 20.07.2011 12:39:58 von Lutz Vieweg

On 07/19/2011 05:06 PM, David Brown wrote:
> It certainly sounds like a safe procedure, but I can see why you feel it's not quite as elegant as
> it could be. You will also be "discarding" blocks that have never been written (at least, not since
> the last discard...) - is there much overhead in that?

Luckily the SSDs we use do not require significant time to process a discard
on areas that were already free - e.g. discarding ~ 250G of SSD space that is
already empty this way takes only ~ 10 seconds.

Regards,

Lutz Vieweg


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 20.07.2011 14:13:42 von Werner Fischer

On Tue, 2011-07-19 at 15:19 +0100, Tom De Mulder wrote:
> In case people are interested, I ran more benchmarks. The impact of T=
RIM=20
> on an over-provisioned drive is remarkable: a 25% performance loss wh=
en=20
> using Postmark.
>=20
> Because this isn't really on-topic for the MD mailing list, I've put =
it=20
> somewhere else:
>=20
> http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive -benchmar=
ks/
>=20
> My next goal, when I have the time, is to compare different amounts o=
f=20
> over-provisioning.

There is a paper from Intel "Over-provisioning an Intel® SSD" (ana=
lyzing
X25-M 160 GB Gen.2 SSDs):
http://cache-www.intel.com/cd/00/00/45/95/459555_459555.pdf

On page 10 of this Intel presentation they mention that a spare area
>27% of native capacity has diminishing returns for such an SSD:
http://maltiel-consulting.com/Enterprise_Data_Integrity_Incr easing_Endu=
rance.pdf

Regards,
Werner

--=20
: Werner Fischer
: Technology Specialist
: Thomas-Krenn.AG | The server-experts
: http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 20.07.2011 14:20:08 von Lutz Vieweg

On 07/20/2011 09:42 AM, David Brown wrote:
>> http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive -benchmarks/
>
> The telling figure here, though, is that TRIM made the single drive significantly slower.

More precisely, online-TRIM of ext4 on Intel SSDs seems to be a bad combination.

I think it's clear you cannot gain much from TRIM if you're willing to spend
the money for 2 times overprovisioning, anyway. You can lose significantly from
online-trim when the filesystem issues a lot of TRIM commands all the time and
when the SSD is slow to process them.

TRIM gains you an advantage with less over-provisioning, and is better
done in batches after significant amounts of data have been written/deleted.

When you try with different levels of over-provisioning, also try with
batched discards (fstrim) between runs of your benchmark.

Regards,

Lutz Vieweg


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and TRIM

am 20.07.2011 14:25:36 von Lutz Vieweg

On 07/20/2011 02:13 PM, Werner Fischer wrote:
> There is a paper from Intel "Over-provisioning an Intel® SSD" (a=
nalyzing
> X25-M 160 GB Gen.2 SSDs):
> http://cache-www.intel.com/cd/00/00/45/95/459555_459555.pdf
>
> On page 10 of this Intel presentation they mention that a spare area
>> 27% of native capacity has diminishing returns for such an SSD:
> http://maltiel-consulting.com/Enterprise_Data_Integrity_Incr easing_En=
durance.pdf

(This latter document is password protected.)

The first document, though, claims almost linear benefit (regarding IOs=
/sec)
from much higher amounts of over-provisioning. Alas, their chart does n=
ot extend
into the region where saturation of the effect must occur for sure.

Regards,

Lutz Vieweg



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html