
HBA Adaptor advice
Hi, following on from a recent thread, can folks with decent multi-port
HBA adaptors please chime in with some model numbers of known decent
adaptors please?
The required use is to grow from currently 8 ish drives to perhaps 12-2=
4
drives per machine. (It partitions out as: one or more RAID6 arrays for
data, plus a couple of backup drives)
Ideally I would like a controller with writeback cache and BBU since
whilst this office machine is likely quite underused, for any sensible
amount of IO (some of the other machines we might upgrade) this seems t=
o
give a 10-100x increase in IOPs? For the moment it's just a nice to
have though
I only intend to use linux software raid, so any onboard raid
functionality is just a liability. Budget is either low =A3100 ish for
multi-port HBAs without cache, up to =A31000 ish for 16-24 port high
performance cache controllers:
So far I saw recommendations for:
- LSI 1068E (SuperMicro 3081E) (8 port 3Gb)
- LSI 9211-8i (8 port 6Gb)
And to avoid:
- Marvel controllers?
- Areca with marvel controllers?
- AOC-SASLP-MV8
these any good?
- LSI MegaRAID 9280-24i4e
- Areca ARC-1880ix-24
I'm completely ignorant of the current state of adaptors today:
- Are there any bargains to be had in the lower end 8-24 port category
(ie come up frequently as ebay specials and aren't locked to special
DELL-only disks, etc?)
- Cable management. Are there any backplanes for retro fitting into
desktop chassis (5 1/4 bays say?) which take single (8087?) connectors?
At the moment I just need to refresh our office server (10-12 disks
including back drives) and we need something compact and quiet so
looking for compact tower chassis options. I'm also looking at adding
more storage into our datacenter racks though, so interested in a
shopping list of reliable higher performance options?
Please add suggestions for good value, reliable controllers known to
work well with linux
Thanks
Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
--Sig_/cwM8rX_rfrMCiIU8W0g2DuP
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
On Thu, 19 May 2011 13:26:49 +0100
Ed W <lists [at] wildgooses.com> wrote:
> Hi, following on from a recent thread, can folks with decent multi-port
> HBA adaptors please chime in with some model numbers of known decent
> adaptors please?
Here is a useful link for you: http://blog.zorinaq.com/?e=3D10
> And to avoid:
> - Marvel controllers?
There are two kinds (families?) of Marvell SATA chips, and those using the
"sata_mv" module do work fine. It seems like all the complaints are directed
at the other kind, supported via "mvsas".
--
With respect,
Roman
--Sig_/cwM8rX_rfrMCiIU8W0g2DuP
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEUEARECAAYFAk3VDrgACgkQTLKSvz+PZwgCnwCWPyIFRnUA/frlyVZCWsF6 Rb2f
4gCeMyYexnmAUx0pwljCJDUaUb4Wl04=
=v5dy
-----END PGP SIGNATURE-----
--Sig_/cwM8rX_rfrMCiIU8W0g2DuP--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 19 May 2011 13:36, Roman Mamedov <rm [at] romanrm.ru> wrote:
> On Thu, 19 May 2011 13:26:49 +0100
> Ed W <lists [at] wildgooses.com> wrote:
>
>> Hi, following on from a recent thread, can folks with decent multi-port
>> HBA adaptors please chime in with some model numbers of known decent
>> adaptors please?
>
> Here is a useful link for you: http://blog.zorinaq.com/?e=10
>
>> And to avoid:
>> - Marvel controllers?
>
> There are two kinds (families?) of Marvell SATA chips, and those using the
> "sata_mv" module do work fine. It seems like all the complaints are directed
> at the other kind, supported via "mvsas".
>
> --
> With respect,
> Roman
>
I have to chime in; I do have this one:
05:00.0 SCSI storage controller: HighPoint Technologies, Inc.
RocketRAID 230x 4 Port SATA-II Controller (rev 02)
Which uses the sata_mv module. From dmesg:
[ 1.062151] sata_mv: Highpoint RocketRAID BIOS CORRUPTS DATA on all
attached drives, regardless of if/how they are configured. BEWARE!
[ 1.062156] sata_mv: For data safety, do not use sectors 8-9 on
"Legacy" drives, and avoid the final two gigabytes on all RocketRAID
BIOS initialized drives.
So stay away from RocketRAID :-) (I only use it as a HBA)
Cheers,
/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 19/05/2011 10:26 PM, Ed W wrote:
> So far I saw recommendations for:
>
> - LSI 1068E (SuperMicro 3081E) (8 port 3Gb)
I'm using one of these, and it's working great. It's a Supermicro
AOC-USASLP-L8i. I'm using all 8 ports, and another 4 drives on the
onboard controller.
Note, however, that I was first running a slightly older kernel, and it
didn't work (bus errors, lots of weird stuff happening), so I
temporarily gave up on it -- I believe it was either 2.6.31 or 2.6.32
(Ubuntu 10.04 LTS). However, due to a need for something else, I
upgraded to a 2.6.35 kernel, and it started working fine with that
kernel. Been using it for ~9 months perfectly fine.
Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/19/2011 8:26 AM, Ed W wrote:
> Hi, following on from a recent thread, can folks with decent
> multi-port HBA adaptors please chime in with some model numbers of
> known decent adaptors please?
>
> The required use is to grow from currently 8 ish drives to perhaps
> 12-24 drives per machine. (It partitions out as: one or more RAID6
> arrays for data, plus a couple of backup drives)
>
> Ideally I would like a controller with writeback cache and BBU since
> whilst this office machine is likely quite underused, for any
> sensible amount of IO (some of the other machines we might upgrade)
> this seems to give a 10-100x increase in IOPs? For the moment it's
> just a nice to have though
>
> I only intend to use linux software raid, so any onboard raid
> functionality is just a liability. Budget is either low =A3100 ish fo=
r
> multi-port HBAs without cache, up to =A31000 ish for 16-24 port high
> performance cache controllers:
I've been using a SuperMicro AOC-SASLP-MV8 (which is on your avoid
list), which reports itself as:
class: SCSI
bus: PCI
detached: 0
driver: mvsas
desc: "Marvell Technology Group Ltd. MV64460/64461/64462 System
Controller, Revision B"
vendorId: 11ab
deviceId: 6485
subVendorId: 15d9
subDeviceId: 0500
I've had it about 6 months at this point with SATA drives hooked up to
it. The issues that I've had with it dropping disks from the 6-disk
RAID-10 array on CentOS 5.5 / 5.6 can probably be traced to:
Not using enterprise grade SATA disks (as the consumer brand takes too
long to timeout on a bad seek, and mdadm dropped it from the array).
Possibly combined with using a really inexpensive set of removable driv=
e
trays. There were a lot of times after the weekly resync where the
entire array went offline due to multiple drives being dropped.
Under normal operation it reads/writes to the disks fine and works fine=
as a controller. Since this is my own personal server, I have not
tested it with good SAS disks or enterprise SATAs and good drive
enclosures. I've since switched over to just hooking up a pair of RAID=
1
arrays to it with a direct connect from the card to the drives (no
removable trays), but I don't have enough time on the new setup to say
that the problem is permanently fixed yet.
The card is inexpensive, which is a plus. It's a PCIe x4 card. I don'=
t
know whether it would be better behaved with a better class of disks /
enclosures.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 19/05/11 20:26, Ed W wrote:
> Please add suggestions for good value, reliable controllers known to
> work well with linux
I have three of these :
http://www.startech.com/product/PEXSATA24E-2-Port-eSATA-4-Po rt-SATA-PCI-Express-x4-SATA-Controller-Adapter-Card-PCIe
and 4 of these :
http://www.ebay.com.au/itm/IBM-M1015-46M0861-ServeRAID-M1015 -SAS-SATA-Controller-/280655527117?pt=AU_Server_Accessories_ Parts&hash=item41585f7ccd
All of which I can't recommend highly enough.
I got the Startech ones cheap from a dodgy shop about 4 years ago. They cost me about $30 each.
I got the IBM (really LSI) ones cheap from ebay at about $110 each at Christmas.
The Startech cards use the sata_mv driver and are solid, the LSI cards use the megaraid_sas driver
and are solid. As a bonus of having SAS ports, I picked up 4 Seagate Cheetah 15k.5 SAS drives for a
wicked fast RAID10 array.
Regards,
Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On Thu, 2011-05-19 at 15:10 -0400, Thomas Harold wrote:
> On 5/19/2011 8:26 AM, Ed W wrote:
> > Hi, following on from a recent thread, can folks with decent
> > multi-port HBA adaptors please chime in with some model numbers of
> > known decent adaptors please?
> >
> > The required use is to grow from currently 8 ish drives to perhaps
> > 12-24 drives per machine. (It partitions out as: one or more RAID6
> > arrays for data, plus a couple of backup drives)
> >
> > Ideally I would like a controller with writeback cache and BBU sinc=
e
> > whilst this office machine is likely quite underused, for any
> > sensible amount of IO (some of the other machines we might upgrade)
> > this seems to give a 10-100x increase in IOPs? For the moment it's
> > just a nice to have though
> >
> > I only intend to use linux software raid, so any onboard raid
> > functionality is just a liability. Budget is either low =C2=A3100 i=
sh for
> > multi-port HBAs without cache, up to =C2=A31000 ish for 16-24 port =
high
> > performance cache controllers:
>
> I've been using a SuperMicro AOC-SASLP-MV8 (which is on your avoid
> list), which reports itself as:
>
> class: SCSI
> bus: PCI
> detached: 0
> driver: mvsas
> desc: "Marvell Technology Group Ltd. MV64460/64461/64462 System
> Controller, Revision B"
> vendorId: 11ab
> deviceId: 6485
> subVendorId: 15d9
> subDeviceId: 0500
>
> I've had it about 6 months at this point with SATA drives hooked up t=
o
> it. The issues that I've had with it dropping disks from the 6-disk
> RAID-10 array on CentOS 5.5 / 5.6 can probably be traced to:
>
> Not using enterprise grade SATA disks (as the consumer brand takes to=
o
> long to timeout on a bad seek, and mdadm dropped it from the array).
> Possibly combined with using a really inexpensive set of removable dr=
ive
> trays. There were a lot of times after the weekly resync where the
> entire array went offline due to multiple drives being dropped.
>
> Under normal operation it reads/writes to the disks fine and works fi=
ne
> as a controller. Since this is my own personal server, I have not
> tested it with good SAS disks or enterprise SATAs and good drive
> enclosures. I've since switched over to just hooking up a pair of RA=
ID1
> arrays to it with a direct connect from the card to the drives (no
> removable trays), but I don't have enough time on the new setup to sa=
y
> that the problem is permanently fixed yet.
>
> The card is inexpensive, which is a plus. It's a PCIe x4 card. I do=
n't
> know whether it would be better behaved with a better class of disks =
/
> enclosures.
Its inexpensive and unfortunately you are describing symptoms that
belong to the chipset.
It is remains firmly on my avoidance list, and i have one...
Rudy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
Hi Ed,
On Thu, May 19, 2011 at 01:26:49PM +0100, Ed W wrote:
> Ideally I would like a controller with writeback cache and BBU since
> whilst this office machine is likely quite underused, for any sensible
> amount of IO (some of the other machines we might upgrade) this seems to
> give a 10-100x increase in IOPs? For the moment it's just a nice to
> have though
Are there actually any HBAs that have BBU without using their RAID
features?
I'd like to stop using hardware RAID but I can't give up the BBU and
write cache.
Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/19/2011 9:08 PM, Andy Smith wrote:
> Hi Ed,
>
> On Thu, May 19, 2011 at 01:26:49PM +0100, Ed W wrote:
>> Ideally I would like a controller with writeback cache and BBU since
>> whilst this office machine is likely quite underused, for any sensible
>> amount of IO (some of the other machines we might upgrade) this seems to
>> give a 10-100x increase in IOPs? For the moment it's just a nice to
>> have though
>
> Are there actually any HBAs that have BBU without using their RAID
> features?
AFAIK the LSI real RAID cards allow this. To get them into a JBOD mode
you have to create a single drive RAID 0 of each disk and export it. By
doing so the RAID firmware is actually active, though not really doing
anything, so you get the cache and BBU benefit of the controller. One
of the XFS developers, Dave Chinner, posted this to the XFS list quite
some time ago when we discussed hardware vs software RAID setups.
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 20/05/2011 03:08, Andy Smith wrote:
> Are there actually any HBAs that have BBU without using their RAID
> features?
>
> I'd like to stop using hardware RAID but I can't give up the BBU and
> write cache.
This is a very interesting question. Does anyone know if say the Areca
ARC-1880ix-24 can be used in the same way, ie battery backed JBOD type mode?
I received a recommendation offlist that the various 3Ware SAS 9750-xx
cards can be used easily as a bunch of single drives, however, comparing
the photos of these with the LSI MegaRAID 9280-xx they seem identical?
(Presumed to be identical?). Anyone know why LSI sell an identical card
under the 3ware brand (still)? Curiously I see the LSI generally
selling a little cheaper than the 3ware in the uk... (wierd)
Are there any cards to avoid because they *can't* be used in this way?
eg Dell PERC6 seem to come up cheaply on ebay - can these be used as BBU
backed single JBOD controllers?
I guess the limitation is that some of these cards can only create a
small number of arrays and/or they don't use their writeback cache
efficiently in the case of multiple arrays?
Thanks for any education here. (I found a cheap Areca on ebay, plus
been eyeing up the various cheap Dell PERC cards...)
Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/20/2011 2:33 AM, Ed W wrote:
> On 20/05/2011 03:08, Andy Smith wrote:
>> Are there actually any HBAs that have BBU without using their RAID
>> features?
>>
>> I'd like to stop using hardware RAID but I can't give up the BBU and
>> write cache.
I'm curious why you are convinced that you need BBWC, or even simply WC,
on an HBA used for md RAID. I'm also curious as to why you are so
adamant about _not_ using the RAID ASIC on an HBA, given that it will
take much greater advantage of the BBWC than md RAID will. You may be
interested to know:
1. When BBWC is enabled, all internal drive caches must be disabled.
Otherwise you eliminate the design benefit of the BBU, and may as
well not have one.
2. w/md RAID on an HBA, if you have a good UPS and don't suffer
kernel panics, crashes, etc, you can disable barrier support in
your FS and you can use the drive caches.
3. The elevator will perform well directly on drives with large cache
Most good higher end RAID cards have 512MB to 1GB or cache. w/12 2TB
drives you'll have a combined cache of 768MB, as most drives of this
size have a 64MB cache. So there's not much difference in total cache
size. And the drive firmware will usually make better decisions WRT
cache use optimization than an upstream RAID card BIOS that has disabled
the drive caches.
For a stable system with good UPS and auto shutdown configured, BBWC is
totally overrated. If the system never takes a nose dive from power
drop, and doesn't crash due to software or hardware failure, then BBWC
is a useless $200-1000 option. Some hardware RAID cards require a
functional BBU before they will allow you to enable write caching. In
that case BBU is needed. In most other cases it's not.
If your current reasoning for wanting write cache on the HBA is
performance, then forget about the write cache as you don't need it with
md RAID. If you want the BBWC combo for safety as your system isn't
stable or you have a crappy or no UPS, then forgo md RAID and use the
hardware RAID and BBWC combo.
One last point: If you're bargain hunting, especially if looking at
used gear on Ebay, that mindset is antithetical to proper system
integration, especially when talking about a RAID card BBU. If you buy
a use card, the first thing you muse do is chuck the BBU and order a new
one, because the used battery can't be trusted--you have no idea how
much life is left in it. For you data to be safe, you need a new
battery. Buying a brand new card w/bundled BBU may cost you the same or
less than a used card and a new battery from the manufacturer.
The following would be a darn good fit for your md RAID office server
setup, given your criteria, WRT the HBA, hot swap cages, drives, and
cables. Drop the LSI SAS HBA into a PCIe 2.0 x8 slot. Drop the Intel
24 port SAS expander into an x4/x8 slot, or mount it to the side or
floor of the chassis and power it via the 4 pin Molex plug. Connect the
8087/8087 cable from the LSI card to the first port on the Intel SAS
Expander. Mount the 5 IcyDock 4 x 2.5" SAS hot swap backplane cages in
5 x 5.25" externally accessible drive bays. Connect each of the five
8087 breakout cables from the remaining 5 ports on the Intel Expander to
each of the hot swap backplanes--one cable per backplane--label which
drive connects to which port on the Intel expander so you can properly
identify failed drives! Mount each Seagate Enterprise 2.5" 1TB drive in
a tray and insert the trays into the backplanes--fill each quad bay
before putting drives in the next bay. After booting the machine hop
into the LSI BIOS and configure for JBOD. You should know how to do the
read.
This setup gives you 12 enterprise 2.5" SAS 7.2K RPM 1TB drives--not
cheap SATA drives not fit for RAID--12TB raw total, in only three 5.25"
bays, and drawing much less power than equivalent 3.5" drives. You will
have 8 free hot swap bays for future expansion, 20TB total if acquiring
the same drives. Controller to drive aggregate bandwidth is 2.4GB/s,
4.8GB/s full duplex, HBA to host b/w is 4/8 GB/s, likely far more than
you need.
The parts list. Total cost from NewEgg in the US is ~$3800 with ~$3000
of that being the 12 drives at $250 each. The HBA + expander are only $470.
Buy 1:
http://www.lsi.com/channel/products/megaraid/sassata/9240-4i /index.html
Buy 1:
http://www.intel.com/Products/Server/RAID-controllers/re-res 2sv240/RES2SV240-Overview.htm
Buy 5:
http://www.icydock.com/goods.php?id=114
Buy 12:
http://www.seagate.com/ww/v/index.jsp?name=st91000640ss-cons tellation2-6gbs-sas-1-tb-hd&vgnextoid=ff13c5b2933d9210VgnVCM 1000001a48090aRCRD&vgnextchannel=f424072516d8c010VgnVCM10000 0dd04090aRCRD&locale=en-US&reqPage=Support#tTabContentSpecif ications
Buy 5 (or local equivalent):
http://www.newegg.com/Product/Product.aspx?Item=N82E16816116 098&cm_re=cable-_-16-116-098-_-Product
Buy 1 (or local equivalent):
http://www.newegg.com/Product/Product.aspx?Item=N82E16816116 093&cm_re=cable-_-16-116-093-_-Product
Food for thought. Hope it's useful as I killed over an hour putting
this together for you. :)
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 05/20/2011 03:33 AM, Ed W wrote:
> On 20/05/2011 03:08, Andy Smith wrote:
>> Are there actually any HBAs that have BBU without using their RAID
>> features?
>>
>> I'd like to stop using hardware RAID but I can't give up the BBU and
>> write cache.
HBAs don't have BBU or write cache. Only RAIDs do. While you can run
the RAID in JBOD mode, you effectively lose the cache (and BBU) aspect
by doing so.
More in a moment.
> This is a very interesting question. Does anyone know if say the Areca
> ARC-1880ix-24 can be used in the same way, ie battery backed JBOD type mode?
If you absolutely insist on using a large expensive RAID card as a JBOD
card, yeah, there are things you *can* do to keep access to the cache
and BBU, though they are counter-intuitive.
First off, the LSI 920x series has a 16 port HBA. You can look it up on
their site. SAS+SATA HBA I think. LSI likes adorning some of their
HBAs with some inherent RAID capability (their IR mode). I personally
prefer the IT mode, but its sometimes hard/impossible to make the switch
(this is usually for motherboard mounted 'RAID' units). HBAs can be used
as RAIDs, though the performance is abysmal (c.f. PERC*, lower end LSI
.... which PERC are rebranded versions of, ...)
Second off, you can turn any of the expensive RAID cards into an 'JBOD'
by doing something like this:
1) have the unit configured in RAID mode
2) build virtual disks out of single drives, as RAID0.
3) iterate 2 until you exhaust your drives.
4) make sure you prevent these drives from messing with your boot drive
order ... some bioses "helpfully" reorganize new drives for you by
messing with this list.
Once the drive is a 1 disk RAID0, you get the cache, and the BBU for the
cache. Yeah, its a little weird. But it does work (we've done this
with some LSI8888's).
When you do this, then use mdadm atop this. We've found, generally, by
doing this, we can build much faster RAIDs than the LSI 8888 units, and
comparible to the 9260's in terms of performance across the same number
of disks, at a lower price. E.g. mdadm and the MD RAID stack are quite
good.
[...]
> I guess the limitation is that some of these cards can only create a
> small number of arrays and/or they don't use their writeback cache
> efficiently in the case of multiple arrays?
These are the issues. Most RAID cards aren't thinking they'll be used
on more than a few LUNs/RAIDs at a time, so they might not scale well
here, with 16 or 24 single drive RAID0's.
The additional cache doesn't buy you much for this arrangement. Might
work against you if the card CPU is slow (as most of the hardware RAID
chips are).
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
--Sig_/kqizuy6QLji7ji1Q9K/QBAK
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
On Fri, 20 May 2011 08:18:32 -0400
Joe Landman <joe.landman [at] gmail.com> wrote:
> Second off, you can turn any of the expensive RAID cards into an 'JBOD'
> by doing something like this:
>
> 1) have the unit configured in RAID mode
>
> 2) build virtual disks out of single drives, as RAID0.
>
> 3) iterate 2 until you exhaust your drives.
>
> 4) make sure you prevent these drives from messing with your boot drive
> order ... some bioses "helpfully" reorganize new drives for you by
> messing with this list.
>
> Once the drive is a 1 disk RAID0, you get the cache, and the BBU for the=
> cache. Yeah, its a little weird. But it does work (we've done this
> with some LSI8888's).
But can you then access SMART of the individual drives?
Or will you see only some bogus block devices which do not accept SMART
commands, do not return real drive identity, and present themselves as RAID0
#1, RAID0 #2 etc. instead?
--
With respect,
Roman
--Sig_/kqizuy6QLji7ji1Q9K/QBAK
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEARECAAYFAk3WX8UACgkQTLKSvz+PZwghQgCfVEt341GRIBm/T5kQjkXn oFjK
uj8An1kqnsQvjttVUoBh6j+jjA1fSse4
=aabW
-----END PGP SIGNATURE-----
--Sig_/kqizuy6QLji7ji1Q9K/QBAK--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 20 May 2011 13:34, Roman Mamedov <rm [at] romanrm.ru> wrote:
> On Fri, 20 May 2011 08:18:32 -0400
> Joe Landman <joe.landman [at] gmail.com> wrote:
>
>> Second off, you can turn any of the expensive RAID cards into an 'JB=
OD'
>> by doing something like this:
>>
>> 1) have the unit configured in RAID mode
>>
>> 2) build virtual disks out of single drives, as RAID0.
>>
>> 3) iterate 2 until you exhaust your drives.
>>
>> 4) make sure you prevent these drives from messing with your boot dr=
ive
>> order ... some bioses "helpfully" reorganize new drives for you by
>> messing with this list.
>>
>> Once the drive is a 1 disk RAID0, you get the cache, and the BBU for=
the
>> cache. =C2=A0Yeah, its a little weird. =C2=A0But it does work (we've=
done this
>> with some LSI8888's).
>
> But can you then access SMART of the individual drives?
> Or will you see only some bogus block devices which do not accept SMA=
RT
> commands, do not return real drive identity, and present themselves a=
s RAID0
> #1, RAID0 #2 etc. instead?
>
> --
> With respect,
> Roman
>
Depends on the controller; e.g.
smartctl -A -d 3ware,$I /dev/twa0
smartctl -A -d megaraid,$I /dev/sda
(where $I is the port on the controller)
/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 05/20/2011 08:34 AM, Roman Mamedov wrote:
> On Fri, 20 May 2011 08:18:32 -0400
> Joe Landman<joe.landman [at] gmail.com> wrote:
>
>> Second off, you can turn any of the expensive RAID cards into an 'JBOD'
>> by doing something like this:
>>
>> 1) have the unit configured in RAID mode
>>
>> 2) build virtual disks out of single drives, as RAID0.
>>
>> 3) iterate 2 until you exhaust your drives.
>>
>> 4) make sure you prevent these drives from messing with your boot drive
>> order ... some bioses "helpfully" reorganize new drives for you by
>> messing with this list.
>>
>> Once the drive is a 1 disk RAID0, you get the cache, and the BBU for the
>> cache. Yeah, its a little weird. But it does work (we've done this
>> with some LSI8888's).
>
> But can you then access SMART of the individual drives?
I don't view the loss of direct SMART access as a bad thing ... most of
the RAID cards will give you CLI access to this data, if in a convoluted
manner. SMART's utility is generally pretty questionable (see the
Google paper for a discussion on the profound lack of correlation of
SMART parameters with actual failure rates). But its there if you want it.
> Or will you see only some bogus block devices which do not accept SMART
> commands, do not return real drive identity, and present themselves as RAID0
> #1, RAID0 #2 etc. instead?
The RAID will provide you an abstraction (e.g. a layer you have to walk
through) to your disks. Seeing what composes the RAID is generally not
hard, though you might need to write a quick and dirty parser for this.
The block devices are not bogus. They are logical block devices.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman [at] scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
Hi
> If you absolutely insist on using a large expensive RAID card as a JBOD
> card, yeah, there are things you *can* do to keep access to the cache
> and BBU, though they are counter-intuitive.
The main issue with hardware cards is that really you need at least two
of them... At the most inopportune moment the only single one you own
will break and then your entire dataset becomes unavailable...
For sure, anyone with moderate or larger budgets, or a pool of similar
hardware, this becomes a case of simply buying an extra one and stashing
it. Or at least keeping an eye on when it becomes end of line and
unavailable to buy a new one...
> First off, the LSI 920x series has a 16 port HBA. You can look it up on
> their site. SAS+SATA HBA I think. LSI likes adorning some of their
> HBAs with some inherent RAID capability (their IR mode). I personally
> prefer the IT mode, but its sometimes hard/impossible to make the switch
> (this is usually for motherboard mounted 'RAID' units). HBAs can be used
> as RAIDs, though the performance is abysmal (c.f. PERC*, lower end LSI
> ... which PERC are rebranded versions of, ...)
This sounds helpful, but I'm not understanding it?
Are you describing the reverse, ie taking a straight HBA card and asking
it to do "hardware raid" of multiple disks?
Or do you mean that performance is dismal even if you make X arrays of 1
disk each in order to access their BB cache?
Or to be really clear - can I take a cheapo PERC6 from ebay, and make it
run 8x disks completely under linux MD Raid, with smartctl access to the
individual disks and BB cache on the card - *with* high performance...
(phew...)
> When you do this, then use mdadm atop this. We've found, generally, by
> doing this, we can build much faster RAIDs than the LSI 8888 units, and
> comparible to the 9260's in terms of performance across the same number
> of disks, at a lower price. E.g. mdadm and the MD RAID stack are quite
> good.
What do you think stops the MD Stack being *better* than a 9260? Also
in very round terms what kind of performance drop do you see from going
to linux MD raid versus a 9260?
> The additional cache doesn't buy you much for this arrangement. Might
> work against you if the card CPU is slow (as most of the hardware RAID
> chips are).
Hopefully not a silly question, but surely the CPU would have to be
extremely slow indeed not to keep up with a sorted bunch of writes that
are being issued to spinning rust drives with multi-ms seek latencies?
Are they really that slow..?
Thanks for your very helpful feedback - much appreciated
Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 05/20/2011 09:21 AM, Ed W wrote:
> Hi
>
>> If you absolutely insist on using a large expensive RAID card as a JBOD
>> card, yeah, there are things you *can* do to keep access to the cache
>> and BBU, though they are counter-intuitive.
>
> The main issue with hardware cards is that really you need at least two
> of them... At the most inopportune moment the only single one you own
> will break and then your entire dataset becomes unavailable...
That is a risk with any proprietary design (a point we refer to in our
marketing, relative to completely closed designs). This said, the issue
on the RAID side isn't all that terrible. RAID cards, individually,
aren't that expensive. You can buy replacements on ebay, or from
various used machine resellers. That is, your data really isn't at an
unmitigateable risk, but it at risk.
Put another way, yeah, having a spare RAID card around isn't a bad idea.
In most cases they don't burn out (we've seen 4 failed RAID cards in
our time in the field, 2 of which were ... er ... customer initiated
burnouts ... due to bad grounding).
> For sure, anyone with moderate or larger budgets, or a pool of similar
> hardware, this becomes a case of simply buying an extra one and stashing
> it. Or at least keeping an eye on when it becomes end of line and
> unavailable to buy a new one...
And in the case of the businesses/researchers, the cost of the
additional card in spares stock locally is (in most cases) in the noise
level as compared to the actual cost of the gear.
That is, its not a terrible thing to do this. If you are a home user,
its another issue entirely. A 1000 EUR might cost as much as the rest
of your system. So you want to mitigate that risk, and not have to pay
that cost. That decision to mitigate, by using MD raid, will come at
some cost, though we see MD raid very much as the future of RAID
systems. Its all about refresh rates and economies of scale.
>> First off, the LSI 920x series has a 16 port HBA. You can look it up on
>> their site. SAS+SATA HBA I think. LSI likes adorning some of their
>> HBAs with some inherent RAID capability (their IR mode). I personally
>> prefer the IT mode, but its sometimes hard/impossible to make the switch
>> (this is usually for motherboard mounted 'RAID' units). HBAs can be used
>> as RAIDs, though the performance is abysmal (c.f. PERC*, lower end LSI
>> ... which PERC are rebranded versions of, ...)
>
> This sounds helpful, but I'm not understanding it?
The 16 port card is mostly HBA, with a little onboard logic for RAID0,
RAID1, RAID10.
>
> Are you describing the reverse, ie taking a straight HBA card and asking
> it to do "hardware raid" of multiple disks?
LSI's HBAs have some of this capability, though we do not recommend
using this. We prefer to use them as straight HBAs.
>
> Or do you mean that performance is dismal even if you make X arrays of 1
> disk each in order to access their BB cache?
No ... we haven't looked into that performance as much, as this is a
very difficult to use model, and honestly, there are no real benefits to
this.
>
> Or to be really clear - can I take a cheapo PERC6 from ebay, and make it
> run 8x disks completely under linux MD Raid, with smartctl access to the
> individual disks and BB cache on the card - *with* high performance...
> (phew...)
I am going to pull a Clinton here, and ask you to define "high
performance" :) More seriously, performance is in the eye of the
beholder ... what does it mean to you, and where do you need to be in
performance ... and from that, you can see if MD RAID will get you there.
>> When you do this, then use mdadm atop this. We've found, generally, by
>> doing this, we can build much faster RAIDs than the LSI 8888 units, and
>> comparible to the 9260's in terms of performance across the same number
>> of disks, at a lower price. E.g. mdadm and the MD RAID stack are quite
>> good.
>
> What do you think stops the MD Stack being *better* than a 9260? Also
> in very round terms what kind of performance drop do you see from going
> to linux MD raid versus a 9260?
Very little on the read side. MD raid is as fast, if not faster than
the 9260 on reads. The 9260 isn't a bad card mind you, it is roughly
midrange in LSI's lineup. The write side ... I think the 9260 has a
deeply pipelined XOR engine you need for the GF(256) calculations. So
we see about a 2x better write performance on the 9260 than we do on the
MD raid.
>> The additional cache doesn't buy you much for this arrangement. Might
>> work against you if the card CPU is slow (as most of the hardware RAID
>> chips are).
>
> Hopefully not a silly question, but surely the CPU would have to be
> extremely slow indeed not to keep up with a sorted bunch of writes that
> are being issued to spinning rust drives with multi-ms seek latencies?
> Are they really that slow..?
Many of the low end cards run processors at 200-800 MHz. Yeah ... some
of them are really ... really ... slow. MD RAID runs circles around
them. And soon, I think it will be running circles around the midrange
(and probably higher end cards as well).
Regards,
Joe
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman [at] scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
--vH3HHxf962mwD/qo
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Hi Joe,
On Fri, May 20, 2011 at 08:18:32AM -0400, Joe Landman wrote:
>> On 20/05/2011 03:08, Andy Smith wrote:
>>> Are there actually any HBAs that have BBU without using their RAID
>>> features?
>>>
>>> I'd like to stop using hardware RAID but I can't give up the BBU and
>>> write cache.
>
> HBAs don't have BBU or write cache. Only RAIDs do. While you can run =
> the RAID in JBOD mode, you effectively lose the cache (and BBU) aspect by=
> doing so.
That's what I thought, thanks.
It's a shame; maybe there will be disks with battery-backed cache
one day.
Cheers,
Andy
--vH3HHxf962mwD/qo
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEAREDAAYFAk3WyHwACgkQIJm2TL8VSQvhMQCgrxvve+lFYO7HsXd1SNrk ISvT
0UAAoJbgH1Bb+9Ca56GWBF1sZmJA4ZJ6
=yS+8
-----END PGP SIGNATURE-----
--vH3HHxf962mwD/qo--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/20/2011 3:01 PM, Andy Smith wrote:
> It's a shame; maybe there will be disks with battery-backed cache
> one day.
You'll never see a cache DRAM BBU built into a drive. If this *concept*
were to be implemented it would be done with flash and a capacitor
instead of a BBU. The capacitor would be sized to hold just enough
juice to power the ASIC, flash chip, and related circuitry, and write
the cache DRAM contents to the flash chip after sensing power to the
card has been lost.
Many higher end RAID cards already have flash backup of the cache DRAM
in addition to, in instead of, a BBU.
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
> It's a shame; maybe there will be disks with battery-backed cache
> one day.
There's already hybrid drives which pack a small SSD onboard to act as
a large cache.
--
Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/20/2011 3:24 PM, Drew wrote:
>> It's a shame; maybe there will be disks with battery-backed cache
>> one day.
>
> There's already hybrid drives which pack a small SSD onboard to act as
> a large cache.
These hybrid drives still have a small 32-64MB cache DRAM, in front of
the SSD. The DRAM loses its contents when the power goes out. The on
board SSD doesn't prevent this cache data loss.
It may be worth noting that most, if not all, pure SSDs also have cache
DRAM in front of the flash array, and thus will lose data in the cache
when the power fails. Some models have what has been termed "super
capacitors" on board to power the device long enough to flush pending
writes in cache to the flash cells, but few, if any, of the
manufacturers advertise that their drives have this feature, or even
bother to put it on the spec sheet. So there's no easy/consistent way,
at present, to really know if your SSD has this feature or not.
As always, a good data persistence strategy starts with a good UPS.
Laptop users have an advantage as they get a free built in UPS, and
typically, good software integration to automatically and safely
shutdown when the battery is about out of juice.
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On Thu, May 19, 2011 at 5:07 PM, Brad Campbell
<lists2009 [at] fnarfbargle.com> wrote:
>
> On 19/05/11 20:26, Ed W wrote:
>
>> Please add suggestions for good value, reliable controllers known to
>> work well with linux
>
> I have three of these :
>
> http://www.startech.com/product/PEXSATA24E-2-Port-eSATA-4-Po rt-SATA-P=
CI-Express-x4-SATA-Controller-Adapter-Card-PCIe
>
> and 4 of these :
>
> http://www.ebay.com.au/itm/IBM-M1015-46M0861-ServeRAID-M1015 -SAS-SATA=
-Controller-/280655527117?pt=3DAU_Server_Accessories_Parts&h ash=3Ditem4=
1585f7ccd
>
> All of which I can't recommend highly enough.
>
> I got the Startech ones cheap from a dodgy shop about 4 years ago. Th=
ey cost me about $30 each.
> I got the IBM (really LSI) ones cheap from ebay at about $110 each at=
Christmas.
>
> The Startech cards use the sata_mv driver and are solid, the LSI card=
s use the megaraid_sas driver and are solid. As a bonus of having SAS p=
orts, I picked up 4 Seagate Cheetah 15k.5 SAS drives for a wicked fast =
RAID10 array.
So, I've been through 3 cards in my current NAS, all of which didn't
fit my needs for one reason or another, and I had given up until this
thread reignited my interest in having more than 6 available SATA
ports in the box. The cards I've tried are:
* SYBA SD-PEX40031 (Pericom PI7C9X111 + Silicone Image Sil3124
chipset) - lots of errors during heavy I/O such as resyc'ing
* 3ware 9650SE-8LPML-Sgl - I thought money would solve the problem,
but I didn't realize that you can't use an expensive RAID card to
access existing data on the disk
* Supermicro AOC-Saslp-MV8 - I thought it would be a perfect match
given my Supermicro motherboard, but also gave me lots of errors
during heavy I/O and this experience seems to be confirmed by other
users in this thread
In all cases switching back to the onboard SATA ports resulted in
seamless operation (same drives, cables, etc.).
In light of what I've learned in this thread I just ordered the
Rosewill RC-218 SATA card, which has the same=A0Marvell 88SX7042 chipse=
t
as the Startech link above, but runs only $80 on Newegg [1] and seems
to have good reviews from a few Linux users. I'll report back after I
get it installed next week.
Cheers,
Tobias
[1]=A0http://www.newegg.com/Product/Product.aspx?Item=3DN82E 16816132018
--
Tobias McNulty, Managing Member
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 21/05/11 04:58, Tobias McNulty wrote:
>
> * SYBA SD-PEX40031 (Pericom PI7C9X111 + Silicone Image Sil3124
> chipset) - lots of errors during heavy I/O such as resyc'ing
Oh dear.. Gee, another item of useless anecdotal evidence directly attributable to cheaply
manufactured cards from a third world country and no direct correlation to a flaky chipset design.
> * 3ware 9650SE-8LPML-Sgl - I thought money would solve the problem,
> but I didn't realize that you can't use an expensive RAID card to
> access existing data on the disk
Ahh..
> * Supermicro AOC-Saslp-MV8 - I thought it would be a perfect match
> given my Supermicro motherboard, but also gave me lots of errors
> during heavy I/O and this experience seems to be confirmed by other
> users in this thread
Oh dear.. Yes, the mvsas driver has been noted to be somewhat problematic still.
I hope you're not a betting man. 3 for 3 is not a great record thus far. On the up-side the 7042's
have been as solid as a rock.. and _fast_ for the last couple of years. Marvell worked with Mark
Lord and the result was a workable version of the sata_mv driver. Shame they don't do the same with
the mvsas code.
Additionally, I migrated two arrays from the Marvell7042 controllers onto the LSI based "IBM"
controllers configured up as JBOD and they just worked. No initialisation or reconfiguration
required at all.
Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 20/05/2011 06:30, Stan Hoeppner wrote:
>> Are there actually any HBAs that have BBU without using their RAID
>> features?
>
> AFAIK the LSI real RAID cards allow this. To get them into a JBOD mode
> you have to create a single drive RAID 0 of each disk and export it. By
> doing so the RAID firmware is actually active, though not really doing
> anything, so you get the cache and BBU benefit of the controller. One
> of the XFS developers, Dave Chinner, posted this to the XFS list quite
> some time ago when we discussed hardware vs software RAID setups.
>
Something I didn't consider:
When you setup most hardware raid cards to have a whole bunch of RAID0
arrays (that are then assembled as software raid), *can* I swap that
hardware raid card for a different make/model or even non raid HBA?
If I can't do this then I don't want such a card... The entire point of
avoiding hardware raid was simply to avoid the proprietory lockin...
So, to be specific, given one of the LSI/3Ware/Areca fast hardware raid
cards mentioned in this thread, and assuming that I have created a bunch
of raid0 arrays, each containing 1 drive, can anyone confirm/deny if
those single disks than then be moved to a) non raid HBA controller, b)
another hardware raid controller as a new raid0 array
I'm rather expecting that a) might be possible if the HBA can ignore the
proprietory bit and see a raw partition, but b) seems highly unlikely
since the new controller presumably wants to reformat the array in it's
own format before it will use it?
If neither are possible then there seems little advantage in using a
hardware raid as a write caching HBA card (unless the card is too
underpowered that it's a bottleneck)
Thanks
Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
Hi Stan
Thanks for the time in composing your reply
> I'm curious why you are convinced that you need BBWC, or even simply =
WC,
> on an HBA used for md RAID.
In the past I have used battery backed cards and where the write speed
is "fsync constrained" the writeback cache makes the app performance fl=
y
at perhaps 10-100x the speed
So for example postfix delivery speeds and mysql write performance are
examples of applications which generate regular fsyncs. The whole app
pauses for basically the seek time of the drive head and performance is
bounded by seek time (assuming spinning media).
If we add a writeback cache then it would appear that you take a couple
of "green" 2TB drives and suddenly your desktop server acquires short
term performance which matches a bunch of high end drives? (noted only
in bursts, after some seconds you catch up with the drives IOPs). For
my basically "small server" requirements this gives me a big boost in
the feeling of interactivity for perhaps less than the price of a coupl=
e
of those high end drives
> I'm also curious as to why you are so
> adamant about _not_ using the RAID ASIC on an HBA, given that it will
> take much greater advantage of the BBWC than md RAID will.
Only for a single reason: Its a small office server and I want the
flexibility to move the drives to a different card (eg failed server,
failed card or something else). Buying a spare card changes the
dynamics quite a bit when the whole server (sans raid card) only costs
=A31,000 ish?
You may be
> interested to know:
>
> 1. When BBWC is enabled, all internal drive caches must be disabled.
> Otherwise you eliminate the design benefit of the BBU, and may as
> well not have one.
Yes, I hadn't thought of that. Good point!
> 2. w/md RAID on an HBA, if you have a good UPS and don't suffer
> kernel panics, crashes, etc, you can disable barrier support in
> your FS and you can use the drive caches.
I don't buy this...
Note we are discussing "long tail events" here. ie catastrophic events
which occur very infrequently. At this point experience is everything
and I concede limited experience, you likely have more, but I'm going t=
o
claim that these events are sufficiently rare that your experience
probably still isn't sufficient to draw proper conclusions...
In my limited experience hardware is pretty reliable and goes bad
rarely. However, my estimate is that powercables fall out, PSUs fail
and UPSs go bad at least as often as the power fails?
Obviously it's application dependent, some may tolerate small dataloss
in the event of powerdown, but I should think most people want a
guarantee that the system is "recoverable" in the event of sudden
powerdown.
I think disabling barriers might not be the best way to avoid fsync
delays, compared with the incremental cost of adding BBU writeback
cache? (basically the same thing, but smaller chance of failure)
> For a stable system with good UPS and auto shutdown configured, BBWC =
is
> totally overrated. If the system never takes a nose dive from power
> drop, and doesn't crash due to software or hardware failure, then BBW=
C
> is a useless $200-1000 option.
It depends on the application, but I claim that there is a fairly
significant chance of hard unexpected powerdown even with a good UPS.
You still are at risk from cables getting pulled, UPSs failing, etc
I think in a properly setup datacenter (racked) environment then it's
easier to control these accidents. Cables can be tied in, layers of
power backup can be managed, it becomes efficient to add quality
surge/lightning protection, etc. However, there is a large proportion
of the market that have a few machines in an office and now it's much
harder to stop the cleaner tripping over the UPS, or hiding it under
boxes of paper until it melts due to overheating...
> If your current reasoning for wanting write cache on the HBA is
> performance, then forget about the write cache as you don't need it w=
ith
> md RAID. If you want the BBWC combo for safety as your system isn't
> stable or you have a crappy or no UPS, then forgo md RAID and use the
> hardware RAID and BBWC combo.
I want BB writeback cache purely to get the performance of effectively
disabling fsync, but without the loss of protection which occurs if you
do so.
> One last point: If you're bargain hunting, especially if looking at
> used gear on Ebay, that mindset is antithetical to proper system
> integration, especially when talking about a RAID card BBU.
I think there are few businesses who actually don't care about budget.
Everything is about optimisation of cost vs performance vs reliability.
Like everything else, my question is really about the tradeoff of a
small incremental spend, which in turn might generate a substantial
performance increase for certain classes of application. Largely I'm
thinking about performance tradeoffs for small office servers priced in
the =A3500-3,000 kind of range (not "proper" high end storage devices)
I think at that kind of level it makes sense to look for bargains,
especially if you are adding servers in small quantities, eg singles or
pairs.
> If you buy
> a use card, the first thing you muse do is chuck the BBU and order a =
new
> one,
Agreed
> Buy 12:
> http://www.seagate.com/ww/v/index.jsp?name=3Dst91000640ss-co nstellati=
on2-6gbs-sas-1-tb-hd&vgnextoid=3Dff13c5b2933d9210VgnVCM10000 01a48090aRC=
RD&vgnextchannel=3Df424072516d8c010VgnVCM100000dd04090aRCRD& locale=3Den=
-US&reqPage=3DSupport#tTabContentSpecifications
Out of curiosity I check the power consumption and reliability numbers
of the 3.5" "Green" drives and it's not so clear cut that the 2.5"
drives outperform?
Thanks for your thoughts - I think this thread has been very
constructive - still very interested to hear good/bad reports of
specific cards - perhaps someone might archive it into some kind of lis=
t?
Cheers
Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
Hi Ed,
I understand your thinking. There is one big cost not mentioned in this=
calculation though:
- what is the cost if the data is lost/corrupt?
compared to that cost, how relevant is the cost of a proper card?
I am getting the feeling of "penny wise, pound foolish"
Now that mind set, of course, describes many a business....
Cheers,
Rudy
On 05/21/2011 01:17 PM, Ed W wrote:
> Hi Stan
>
> Thanks for the time in composing your reply
>
>
>> I'm curious why you are convinced that you need BBWC, or even simply=
WC,
>> on an HBA used for md RAID.
>>
> In the past I have used battery backed cards and where the write spee=
d
> is "fsync constrained" the writeback cache makes the app performance =
fly
> at perhaps 10-100x the speed
>
> So for example postfix delivery speeds and mysql write performance ar=
e
> examples of applications which generate regular fsyncs. The whole ap=
p
> pauses for basically the seek time of the drive head and performance =
is
> bounded by seek time (assuming spinning media).
>
> If we add a writeback cache then it would appear that you take a coup=
le
> of "green" 2TB drives and suddenly your desktop server acquires short
> term performance which matches a bunch of high end drives? (noted onl=
y
> in bursts, after some seconds you catch up with the drives IOPs). Fo=
r
> my basically "small server" requirements this gives me a big boost in
> the feeling of interactivity for perhaps less than the price of a cou=
ple
> of those high end drives
>
>
>
>> I'm also curious as to why you are so
>> adamant about _not_ using the RAID ASIC on an HBA, given that it wil=
l
>> take much greater advantage of the BBWC than md RAID will.
>>
> Only for a single reason: Its a small office server and I want the
> flexibility to move the drives to a different card (eg failed server,
> failed card or something else). Buying a spare card changes the
> dynamics quite a bit when the whole server (sans raid card) only cost=
s
> =A31,000 ish?
>
>
> You may be
>
>> interested to know:
>>
>> 1. When BBWC is enabled, all internal drive caches must be disabled=
=2E
>> Otherwise you eliminate the design benefit of the BBU, and may =
as
>> well not have one.
>>
> Yes, I hadn't thought of that. Good point!
>
>
>> 2. w/md RAID on an HBA, if you have a good UPS and don't suffer
>> kernel panics, crashes, etc, you can disable barrier support in
>> your FS and you can use the drive caches.
>>
> I don't buy this...
>
> Note we are discussing "long tail events" here. ie catastrophic event=
s
> which occur very infrequently. At this point experience is everything
> and I concede limited experience, you likely have more, but I'm going=
to
> claim that these events are sufficiently rare that your experience
> probably still isn't sufficient to draw proper conclusions...
>
> In my limited experience hardware is pretty reliable and goes bad
> rarely. However, my estimate is that powercables fall out, PSUs fail
> and UPSs go bad at least as often as the power fails?
>
> Obviously it's application dependent, some may tolerate small datalos=
s
> in the event of powerdown, but I should think most people want a
> guarantee that the system is "recoverable" in the event of sudden
> powerdown.
>
> I think disabling barriers might not be the best way to avoid fsync
> delays, compared with the incremental cost of adding BBU writeback
> cache? (basically the same thing, but smaller chance of failure)
>
>
>
>> For a stable system with good UPS and auto shutdown configured, BBWC=
is
>> totally overrated. If the system never takes a nose dive from power
>> drop, and doesn't crash due to software or hardware failure, then BB=
WC
>> is a useless $200-1000 option.
>>
> It depends on the application, but I claim that there is a fairly
> significant chance of hard unexpected powerdown even with a good UPS.
> You still are at risk from cables getting pulled, UPSs failing, etc
>
> I think in a properly setup datacenter (racked) environment then it's
> easier to control these accidents. Cables can be tied in, layers of
> power backup can be managed, it becomes efficient to add quality
> surge/lightning protection, etc. However, there is a large proportio=
n
> of the market that have a few machines in an office and now it's much
> harder to stop the cleaner tripping over the UPS, or hiding it under
> boxes of paper until it melts due to overheating...
>
>
>
>> If your current reasoning for wanting write cache on the HBA is
>> performance, then forget about the write cache as you don't need it =
with
>> md RAID. If you want the BBWC combo for safety as your system isn't
>> stable or you have a crappy or no UPS, then forgo md RAID and use th=
e
>> hardware RAID and BBWC combo.
>>
> I want BB writeback cache purely to get the performance of effectivel=
y
> disabling fsync, but without the loss of protection which occurs if y=
ou
> do so.
>
>
>
>> One last point: If you're bargain hunting, especially if looking at
>> used gear on Ebay, that mindset is antithetical to proper system
>> integration, especially when talking about a RAID card BBU.
>>
> I think there are few businesses who actually don't care about budget=
=2E
> Everything is about optimisation of cost vs performance vs reliabilit=
y.
> Like everything else, my question is really about the tradeoff of a
> small incremental spend, which in turn might generate a substantial
> performance increase for certain classes of application. Largely I'm
> thinking about performance tradeoffs for small office servers priced =
in
> the =A3500-3,000 kind of range (not "proper" high end storage devices=
)
>
> I think at that kind of level it makes sense to look for bargains,
> especially if you are adding servers in small quantities, eg singles =
or
> pairs.
>
>
>
>> If you buy
>> a use card, the first thing you muse do is chuck the BBU and order a=
new
>> one,
>>
> Agreed
>
>
>
>> Buy 12:
>> http://www.seagate.com/ww/v/index.jsp?name=3Dst91000640ss-co nstellat=
ion2-6gbs-sas-1-tb-hd&vgnextoid=3Dff13c5b2933d9210VgnVCM1000 001a48090aR=
CRD&vgnextchannel=3Df424072516d8c010VgnVCM100000dd04090aRCRD &locale=3De=
n-US&reqPage=3DSupport#tTabContentSpecifications
>>
> Out of curiosity I check the power consumption and reliability number=
s
> of the 3.5" "Green" drives and it's not so clear cut that the 2.5"
> drives outperform?
>
>
> Thanks for your thoughts - I think this thread has been very
> constructive - still very interested to hear good/bad reports of
> specific cards - perhaps someone might archive it into some kind of l=
ist?
>
> Cheers
>
> Ed W
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo [at] vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
Hi
> I understand your thinking. There is one big cost not mentioned in this
> calculation though:
> - what is the cost if the data is lost/corrupt?
I think it's fair to say that the loss of all your business data is the
loss of your entire business?!
That said if you are Skype you don't spend 8.5 billion on raid cards,
instead you choose a layered approach to availability which normally
trades speed of restore time vs cost.
Eg one might specify raid 6, dual mirrored servers, backed up to some
spare disks, blueray and some offsite storage service. This would give
resilliance to various types of disaster without spending the entire
budget on a fancy raid card?
In fact if you go back to my question, the *entire* point is that I
don't want the choice of card to be a point of failure, ie it's my
specific point to purchase a card such that it can be swapped out for
near any other card in the event of failure.
> compared to that cost, how relevant is the cost of a proper card?
See point above. I don't get a strong feeling that a "proper card" is
any more reliable and resiliant than a well chosen cheap card? If that
theory is correct then the ability to swap in another cheap card in the
event of disaster is valuable and eliminates a point of failure for
little cost?
> I am getting the feeling of "penny wise, pound foolish"
I don't see that your logic leads here?
There is a clear definition of good/bad here. The only acceptable
performance is that all reads/writes are accurate and completed. No
data should be lost or corrupted. Assuming that the market can be
partitioned into good/bad cards based on the definition above, then if
we select from only "good" cards, then price appears to only buy me
performance, nothing else?
So my question is how to choose from all the "good" cards, the best bang
for buck. I don't see any reason not to buy a cheaper card that
performs well, subject to it being reliable and doesn't loose data.
Does someone have a claim that dataloss is actually on a curve and that
more expensive cards corrupt less data and cheaper cards corrupt more
data... That doesn't seem to fit with expectation... (I expect either
working cards that loose nothing, or bad cards that loose some data.
Black and white)
> Now that mind set, of course, describes many a business....
I think this is a silly line of argument. All you can ever do is buy
"insurance" against low probability events occuring. Annoyingly the
"insurance" in this scenario doesn't always pay out and so the question
is how much to spend on orthogonal types of insurance to increase the
chance of a payout in the case of disaster...
It's always easy in the event of some disaster to point out how you
should have bought some different type of "insurance", but equally it's
also dead money that a business could spend to generate income...
Balancing funding between profitable activities and insurance is a fine
line (especially since you are insuring against infrequent events)
As engineers, yes it's always easy to prefer to spend money on technical
"insurance", but accept also that there are competing demands on where
cash gets deployed to earn a return?
Cheers
Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: HBA Adaptor advice
> -----Original Message-----
> From: linux-raid-owner [at] vger.kernel.org [mailto:linux-raid-
> owner [at] vger.kernel.org] On Behalf Of Rudy Zijlstra
> Sent: Saturday, May 21, 2011 6:29 AM
> To: Ed W
> Cc: Stan Hoeppner; linux-raid [at] vger.kernel.org
> Subject: Re: HBA Adaptor advice
>
> Hi Ed,
>
> I understand your thinking. There is one big cost not mentioned in this
> calculation though:
> - what is the cost if the data is lost/corrupt?
>
> compared to that cost, how relevant is the cost of a proper card?
>
> I am getting the feeling of "penny wise, pound foolish"
Well, not necessarily. Your point is taken, but some data is simply
not critical. A backup system, for example, may not be as critical as a
main system. There are also some cases where availability is quite properly
deemed more important than reliability.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: HBA Adaptor advice
> -----Original Message-----
> From: linux-raid-owner [at] vger.kernel.org [mailto:linux-raid-
> owner [at] vger.kernel.org] On Behalf Of Ed W
> Sent: Saturday, May 21, 2011 6:55 AM
> To: Rudy Zijlstra
> Cc: Stan Hoeppner; linux-raid [at] vger.kernel.org
> Subject: Re: HBA Adaptor advice
>
> Hi
>
> > I understand your thinking. There is one big cost not mentioned in this
> > calculation though:
> > - what is the cost if the data is lost/corrupt?
>
> I think it's fair to say that the loss of all your business data is the
> loss of your entire business?!
If a single device failure results in the loss of all one's business
data, then one has done a completely incompetent job of building a data
system. In a properly designed system, the question is how much time and
effort will be spent over the life of the system in recovering data, or more
properly the cost of those resources, vs. the capital outlay for more
expensive hardware. If any loss of productivity for the company as a whole
is involved with a hardware failure, then that should be taken into account,
as well.
> That said if you are Skype you don't spend 8.5 billion on raid cards,
> instead you choose a layered approach to availability which normally
> trades speed of restore time vs cost.
>
> Eg one might specify raid 6, dual mirrored servers, backed up to some
> spare disks, blueray and some offsite storage service. This would give
> resilliance to various types of disaster without spending the entire
> budget on a fancy raid card?
>
> In fact if you go back to my question, the *entire* point is that I
> don't want the choice of card to be a point of failure, ie it's my
> specific point to purchase a card such that it can be swapped out for
> near any other card in the event of failure.
>
> > compared to that cost, how relevant is the cost of a proper card?
>
> See point above. I don't get a strong feeling that a "proper card" is
> any more reliable and resiliant than a well chosen cheap card? If that
> theory is correct then the ability to swap in another cheap card in the
> event of disaster is valuable and eliminates a point of failure for
> little cost?
Well, yes, and no. First of all, a "bad" card is often the result
of random factors unmitigated by any process. The most expensive card on
Earth may have accidentally been exposed to ESD at some point in its
existence, for example. OTOH, an inexpensive card may not necessarily be of
poor quality or design. That said, I think there is a reasonable
expectation that a higher cost card should be the result of careful
engineering, high quality production methods, and extensive QC procedures,
all of which may be somewhat less likely of a lower cost card.
I think on average one may expect lower failure rates on higher cost
devices. The thing is, a statistical average has nothing to do with any
given failure. A customer does not care if he is the only client who has
ever lost any data out of hundreds of clients. He only cares that he has
lost his data.
> > I am getting the feeling of "penny wise, pound foolish"
>
> I don't see that your logic leads here?
>
> There is a clear definition of good/bad here. The only acceptable
> performance is that all reads/writes are accurate and completed. No
> data should be lost or corrupted. Assuming that the market can be
> partitioned into good/bad cards based on the definition above, then if
> we select from only "good" cards, then price appears to only buy me
> performance, nothing else?
I would say there are exceptions, but in general, yes. More to the
point - and I think this is your point - relying upon quality hardware to
prevent failures is a much poorer approach than developing a strategy that
mitigates the impact of failures. Put more simply, a proper backup strategy
is a must.
> So my question is how to choose from all the "good" cards, the best bang
> for buck. I don't see any reason not to buy a cheaper card that
> performs well, subject to it being reliable and doesn't loose data.
>
> Does someone have a claim that dataloss is actually on a curve and that
> more expensive cards corrupt less data and cheaper cards corrupt more
> data... That doesn't seem to fit with expectation... (I expect either
> working cards that loose nothing, or bad cards that loose some data.
> Black and white)
Well, yes, but there is a (fairly minor, I think) statistical
correlation between cost and failure rates.
> > Now that mind set, of course, describes many a business....
>
> I think this is a silly line of argument. All you can ever do is buy
> "insurance" against low probability events occuring. Annoyingly the
> "insurance" in this scenario doesn't always pay out and so the question
> is how much to spend on orthogonal types of insurance to increase the
> chance of a payout in the case of disaster...
>
> It's always easy in the event of some disaster to point out how you
> should have bought some different type of "insurance", but equally it's
> also dead money that a business could spend to generate income...
> Balancing funding between profitable activities and insurance is a fine
> line (especially since you are insuring against infrequent events)
>
> As engineers, yes it's always easy to prefer to spend money on technical
> "insurance", but accept also that there are competing demands on where
> cash gets deployed to earn a return?
Well, there is a big difference between "insurance" in the ordinary
sense, which is to say a recurring premium paid ad infinitum and a one time
capital outlay that offers greater reliability for the indefinite future.
In addition, depending upon the application, performance does have value.
All that said, I agree that as long as the performance is acceptable, and as
long as the average reliability is reasonable, the lower cost solution
coupled with a solid backup strategy is the better choice.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/21/2011 6:24 AM, Ed W wrote:
> On 20/05/2011 21:58, Stan Hoeppner wrote:
>
>> As always, a good data persistence strategy starts with a good UPS.
>
> I'm sure you are going to tell me that my APCs aren't good UPSs, but I
In my experience APC are good UPS. Interestingly, I have an APC
SU1400RMNET manufactured in *1997*, powering my home office rack. I've
replaced the batteries 4 times, but the UPS itself is like the Energizer
Bunny. 14 years and still going strong.
> have something like 5 APCs and 4 have failed in odd ways due to the
> battery dying, inside of around 2 years from new. Sure you replace the
Then I'd guess you're not performing proper UPS maintenance. Once
yearly you need to perform a deep cycle self test which can notify you
of marginal batteries at a much earlier stage. Your APC manual has
instructions for performing this test, or you can download the manual
from there site if it's been lost.
All APCs inform you when the battery needs to be replaced, via front
panel LED and via software or network notification (Email/SNMP). But
you don't want to wait for that. Do the deep self test.
> battery, but failure modes each time caused a sudden power failure. In
> nearly all cases the UPS failed before I would have had a sudden power
> loss for other reasons...
Yep. Lack of proper UPS maintenance and monitoring.
> So, I'm not convinced that UPSs dramatically raise the uptime, and where
Without a UPS in Missouri USA, your servers will go down from power loss
at *minimum* 50-100 times per year due to electrical storms, high winds,
power line maintenance, brown outs and sags caused by all manner of
things, truck hitting power pole, etc, etc.
> they do it's in well designed, racked, datacenter environments where
> "accidents" don't dominate the downtime risk?
Doesn't matter if it's a corporate datacenter, your rack in the
basement, or an office pedestal server. What counts is proper design
and installation. It's is trivially simple in an office environment to
route all cables in a manner that they won't be tripped over. I'm truly
shocked that cable tripping could be an issue for anyone in 2011, let
alone 1999. Get a rack cabinet and stick it in a corner. Here, get this:
http://cgi.ebay.co.uk/COMPAQ-42U-SERVER-RACK-CABINET-ENCLOSU RE-/150608001643?pt=UK_Computing_Networking_SM&hash=item2310 efba6b
and 2 or 3 of these, since all your servers are non-rack boxes:
http://cgi.ebay.co.uk/StarTech-Adjustable-Depth-Fixed-Server -Rack-Cabinet-She-/320618338412?pt=UK_Computing_ComputerComp onents_Monitors&hash=item4aa657986c
Problem solved.
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/21/2011 6:17 AM, Ed W wrote:
> Hi Stan
>
> Thanks for the time in composing your reply
Heh, I'm TheHardwareFreak, whaddya expect? ;) Note the domain in my
email addy.
>> I'm curious why you are convinced that you need BBWC, or even simply=
WC,
>> on an HBA used for md RAID.
>
> In the past I have used battery backed cards and where the write spee=
d
> is "fsync constrained" the writeback cache makes the app performance =
fly
> at perhaps 10-100x the speed
Ok, now that makes more sense. This is usually the case with a hardwar=
e
RAID card or SAN controller, though depends on the
vendor/implementation. I've never run one in 'JBOD' mode but with writ=
e
cache enabled, so I can't say if fsync behavior will be the same using
md RAID or not. Maybe someone else has tested this.
> Only for a single reason: Its a small office server and I want the
> flexibility to move the drives to a different card (eg failed server,
> failed card or something else). Buying a spare card changes the
> dynamics quite a bit when the whole server (sans raid card) only cost=
s
> =A31,000 ish?
If adding a real hardware RAID card and enterprise drives to a 'base'
server, the storage will nearly always cost more than the server,
especially with HP Proliant 1U quad core rack servers going for well
less than $1000 USD. This has been reality for a few years now.
> You may be
>> interested to know:
>>
>> 1. When BBWC is enabled, all internal drive caches must be disabled=
=2E
>> Otherwise you eliminate the design benefit of the BBU, and may a=
s
>> well not have one.
>
> Yes, I hadn't thought of that. Good point!
>
>> 2. w/md RAID on an HBA, if you have a good UPS and don't suffer
>> kernel panics, crashes, etc, you can disable barrier support in
>> your FS and you can use the drive caches.
>
> I don't buy this...
Well, take into consideration that the vast majority of people running
md RAID arrays, including most, if not all, on this list, aren't using
hardware writeback cache. They using plain Jane SAS/SATA HBAs. Some
are using hybrid hardware arrays stitched together with md RAID stripin=
g
or concatenation. But in those cases we're talking multiple tens of
thousands of dollars per system.
> In my limited experience hardware is pretty reliable and goes bad
> rarely. However, my estimate is that powercables fall out, PSUs fail
> and UPSs go bad at least as often as the power fails?
*Quality* hardware today is very reliable.
Power cords *never* come lose in my experience, I don't allow it.
PSUs and UPSes fail at about the same rate as RAID cards, IME--*rarely*
Apparently Britain has a far better power grid than the States.
> Obviously it's application dependent, some may tolerate small datalos=
s
> in the event of powerdown, but I should think most people want a
> guarantee that the system is "recoverable" in the event of sudden
> powerdown.
There is always a tradeoff here between performance, resilience,
flexibility, and cost. You currently have conflicting criteria in this
regard. If you can afford all that you want, pick that which is most
important to eliminate the conflicts. Then implement it.
> I think disabling barriers might not be the best way to avoid fsync
> delays, compared with the incremental cost of adding BBU writeback
> cache? (basically the same thing, but smaller chance of failure)
On the type of small office server you described, it's difficult to
grasp how performance is so critical. You sound like a candidate for a
mixed SSD + SAS/SATA RAID setup. Put things that require low latency,
such as the Postfix spool, Dovecot indexes, and MySQL tables on SSD, an=
d
put user data, such as IMAP mail directories, home directory files, etc=
,
on spinning RAID. This way you get high performance and low cost.
> It depends on the application, but I claim that there is a fairly
> significant chance of hard unexpected powerdown even with a good UPS.
> You still are at risk from cables getting pulled, UPSs failing, etc
If cables getting yanked is a concern, you have human issues that must
be solved long before the technical aspects of system resiliency. I've
not built/installed/used/serviced a pedestal server in over a decade.
> I think in a properly setup datacenter (racked) environment then it's
> easier to control these accidents.
We don't have "accidents" in our datacenters, not the homo sapien
initiated type you refer to.
> Cables can be tied in, layers of
> power backup can be managed, it becomes efficient to add quality
> surge/lightning protection, etc. However, there is a large proportio=
n
> of the market that have a few machines in an office and now it's much
> harder to stop the cleaner tripping over the UPS, or hiding it under
> boxes of paper until it melts due to overheating...
Again, these types of problems can't be solved with technological means=
=2E
> I want BB writeback cache purely to get the performance of effectivel=
y
> disabling fsync, but without the loss of protection which occurs if y=
ou
> do so.
You can have it with some cards. But, you will lose your ability to
swap the drives to a different make/model of HBA in the future.
> Everything is about optimisation of cost vs performance vs reliabilit=
y.
Yep.
> Like everything else, my question is really about the tradeoff of a
> small incremental spend, which in turn might generate a substantial
> performance increase for certain classes of application. Largely I'm
> thinking about performance tradeoffs for small office servers priced =
in
> the =A3500-3,000 kind of range (not "proper" high end storage devices=
)
'Proper' need not be 'high end' nor expensive.
> I think at that kind of level it makes sense to look for bargains,
> especially if you are adding servers in small quantities, eg singles =
or
> pairs.
Again, that's exactly what the parts I posted gives you.
>> Buy 12:
>> http://www.seagate.com/ww/v/index.jsp?name=3Dst91000640ss-co nstellat=
ion2-6gbs-sas-1-tb-hd&vgnextoid=3Dff13c5b2933d9210VgnVCM1000 001a48090aR=
CRD&vgnextchannel=3Df424072516d8c010VgnVCM100000dd04090aRCRD &locale=3De=
n-US&reqPage=3DSupport#tTabContentSpecifications
>
> Out of curiosity I check the power consumption and reliability number=
s
> of the 3.5" "Green" drives and it's not so clear cut that the 2.5"
> drives outperform?
WD's Green drives have a 5400 rpm 'variable' spindle speed. The Seagat=
e
2.5" SAS drive has a 7.2k spindle speed.
It's difficult to align partitions properly on the Green drives due to
native 4K sectors translated by drive firmware to 512B sectors. The
Seagate SAS drive has native 512B sectors.
The Green drives have aggressive power saving firmware not suitable for
business use as the heads are auto parked every 8 seconds or so. IIRC
the drive goes into sleep mode after a short period of inactivity on th=
e
host interface. In short, these drives are designed optimally for the
"is not running" case rather than the "running" case. Hence the name
"Green". How do you save power? Turn off the drive. And that's
exactly what these drives are designed to do.
The Seagate 2.5" SAS drive has TLER support, the Green doesn't. If you
go hardware RAID, you need TLER. It's good to have for md RAID as well
but not a requirement.
Check the warranty difference between the Seagate SAS drive and the WD
Green. Also note WD's 'RAID use' policy.
> Thanks for your thoughts - I think this thread has been very
> constructive - still very interested to hear good/bad reports of
> specific cards - perhaps someone might archive it into some kind of l=
ist?
I see RAID card shootouts now and then. Google should find you
something. Thought you won't see anyone testing Linux md RAID on a
hardware RAID card in JBOD mode.
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/21/2011 6:54 AM, Ed W wrote:
> In fact if you go back to my question, the *entire* point is that I
> don't want the choice of card to be a point of failure, ie it's my
> specific point to purchase a card such that it can be swapped out for
> near any other card in the event of failure.
You're given about 3 or 4 conflicting requirement now WRT your 'perfect'
HBA.
What HBAs are you currently using? How many of your stated requirements
over the past few days do your current HBAs fulfill?
Do you have a tape or D2D backup system in place?
There is no guarantee that you can swap one dead HBA for another brand
with a different chipset on board and have it work without issue. If
you are that concerned you need to buy two identical cheap HBAs so you
have a spare. But wait! You must have hardware write cache for md RAID
as well. But if you do that, you're locked into that vendor's cards.
And on, and on...
I've never seen nor heard of a real SA in a business environment
vacillate like this over a simple RAID/HBA acquisition, as if the
company's entire 1st quarter net profit was being wrapped up in this HBA
purchase. And I've never head of an SA being concerned about cable
tripping of all damn things taking down a server.
Something in this whole thread just doesn't jibe...
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 05/22/2011 11:41 AM, Stan Hoeppner wrote:
> On 5/21/2011 6:54 AM, Ed W wrote:
>
>
>> In fact if you go back to my question, the *entire* point is that I
>> don't want the choice of card to be a point of failure, ie it's my
>> specific point to purchase a card such that it can be swapped out for
>> near any other card in the event of failure.
>>
> You're given about 3 or 4 conflicting requirement now WRT your 'perfect'
> HBA.
>
> What HBAs are you currently using? How many of your stated requirements
> over the past few days do your current HBAs fulfill?
>
> Do you have a tape or D2D backup system in place?
>
> There is no guarantee that you can swap one dead HBA for another brand
> with a different chipset on board and have it work without issue. If
> you are that concerned you need to buy two identical cheap HBAs so you
> have a spare. But wait! You must have hardware write cache for md RAID
> as well. But if you do that, you're locked into that vendor's cards.
> And on, and on...
>
> I've never seen nor heard of a real SA in a business environment
> vacillate like this over a simple RAID/HBA acquisition, as if the
> company's entire 1st quarter net profit was being wrapped up in this HBA
> purchase. And I've never head of an SA being concerned about cable
> tripping of all damn things taking down a server.
>
> Something in this whole thread just doesn't jibe...
>
>
The amount of money that his time has cost discussing this & thinking
about it, is most likely already noticeably more then the cost of a
mid-range RAID card.
My approach (and i have my own small company):
- use HW RAID on the system disks (RAID5) (and have a spare controller
of same type ready)
- use MD RAID on big storage with cheap disks (and have spare disks
lying ready)
- have a nightly automated backup to different system, with versioning
and ability to recover state of half year ago
That other system is in different building.
As i do not upgrade the servers that often, this ensures:
- i do not need to spend a long time on getting the system back up, if a
system disk goes bye-bye
- no need to think long on how grub/lilo was supposed to be working for
multiple disks
- no need to remember to re-install bootloader on all related disks (so
i safe-guard against my own mistakes. Takes some money, yes, but i am
willing to pay that part of insurance quit willingly. I am aware i make
mistakes, especially when time pressure is high)
- backup in place for the usual stupid mistaken deletes.
yes, i keep spare controllers. Do i need them? not really... so far i
have had only 1 raid card die on me... in 10+ years i am using them.
I've had many disks go to bit hell, and some mobo. Not raid cards though.
My main issues with this discussion, is that it assumes:
- no time pressure when the shit hits the fan
- the system maintainer does not make mistakes
Both of them fail in real-life, especially with the small businesses
where this discussion is relevant for cost reasons. Thus my stated
feeling of "penny wise, pound foolish". Murphy being what it is, things
usually fail when you need to have your attention on something else.
That means there is great opportunity at such a time to make mistakes.
Thus the setup of such systems needs to take the human aspect into
account. As far as i can see the setup he is defining is simply too
complex for the situation.
I've had things fail on me when i needed to leave in 2 hour, as i had a
flight to catch. I also needed that server to be running...
Cheers,
Rudy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 22/05/11 17:04, Stan Hoeppner wrote:
> WD's Green drives have a 5400 rpm 'variable' spindle speed. The Seagate
> 2.5" SAS drive has a 7.2k spindle speed.
Actually, I'm pretty sure the WD drives have a 5400 rpm spindle speed
period. I've got 15 of them here and I have no evidence of any form of
spindle speed variation. They say the drives have spindle speed :
"intellipower" which is marketspeak for slow enough to save a few watts,
but fast enough to do the job.
> It's difficult to align partitions properly on the Green drives due to
> native 4K sectors translated by drive firmware to 512B sectors. The
> Seagate SAS drive has native 512B sectors.
Actually it's not difficult at all. You just make sure all your
partitions start on an even multiple of 8 sectors. No magic in it. Just
the same as all my SSD partitions start on 512k boundaries.
> The Green drives have aggressive power saving firmware not suitable for
> business use as the heads are auto parked every 8 seconds or so. IIRC
> the drive goes into sleep mode after a short period of inactivity on the
> host interface. In short, these drives are designed optimally for the
> "is not running" case rather than the "running" case. Hence the name
> "Green". How do you save power? Turn off the drive. And that's
> exactly what these drives are designed to do.
You can turn off the aggressive head parking with a little DOS utility,
and they don't go to sleep at all unless you tell them to. They will
happily keep spinning just the same as any other disk.
I'm running them in a couple of large(ish) RAID arrays. I'm not saying
it's a good idea, it's just been my experience with ultra-cheap drives
that if you burn in the drives to weed out the early failures, and you
keep them running 24/7 in a nice environment they tend to last long
enough to do the job. I tend to replace my drives at around ~30,000
hours, so these have a long way to go yet.
On the other hand, I have my company data on Seagate Cheetah SAS drives
in RAID-10, but I back up to the large WD Green arrays.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 5/22/2011 5:09 AM, Brad Campbell wrote:
> On 22/05/11 17:04, Stan Hoeppner wrote:
>
>> WD's Green drives have a 5400 rpm 'variable' spindle speed. The Seagate
>> 2.5" SAS drive has a 7.2k spindle speed.
>
> Actually, I'm pretty sure the WD drives have a 5400 rpm spindle speed
> period. I've got 15 of them here and I have no evidence of any form of
> spindle speed variation. They say the drives have spindle speed :
> "intellipower" which is marketspeak for slow enough to save a few watts,
> but fast enough to do the job.
From: http://www.anandtech.com/show/2385/2
The Western Digital drive's IntelliPower algorithm, which varies the
rotational speed between 5400RPM and 7200RPM, dictates the Western
Digital's rotational speed.
>> It's difficult to align partitions properly on the Green drives due to
>> native 4K sectors translated by drive firmware to 512B sectors. The
>> Seagate SAS drive has native 512B sectors.
>
> Actually it's not difficult at all. You just make sure all your
> partitions start on an even multiple of 8 sectors. No magic in it. Just
> the same as all my SSD partitions start on 512k boundaries.
IIRC from discussions here, mdadm has alignment issues with hybrid
sector size drives when assembling raw disks. Not everyone assembles
their md devices from partitions. Many assemble raw devices.
>> The Green drives have aggressive power saving firmware not suitable for
>> business use as the heads are auto parked every 8 seconds or so. IIRC
>> the drive goes into sleep mode after a short period of inactivity on the
>> host interface. In short, these drives are designed optimally for the
>> "is not running" case rather than the "running" case. Hence the name
>> "Green". How do you save power? Turn off the drive. And that's
>> exactly what these drives are designed to do.
>
> You can turn off the aggressive head parking with a little DOS utility,
> and they don't go to sleep at all unless you tell them to. They will
> happily keep spinning just the same as any other disk.
You must boot your server with MD-DOS or FreeDOS and run wdidle3 once
for each Green drive in the system. But, IIRC, if the drives are
connected via SAS expander or SATA PMP, this will not work. A direct
connection to the HBA is required.
Once one accounts for all the necessary labor and configuration
contortions one must put himself through to make a Green drive into a
'regular' drive, it is often far more cost effective to buy 'regular'
drives to begin with. This saves on labor $$ which is usually greater,
from a total life cycle perspective, than the drive acquisition savings.
The drives you end up with are already designed and tuned for the
application. Reiterating Rudy's earlier point, using the Green drives
in arrays is "penny wise, pound foolish".
Google WD20EARS and you'll find a 100:1 or more post ratio of problems
vs praise for this drive. This is the original 2TB model which has
shipped in much greater numbers into the marketplace than all other
Green drives. Heck, simply search the archives of this list.
> I'm running them in a couple of large(ish) RAID arrays. I'm not saying
> it's a good idea, it's just been my experience with ultra-cheap drives
> that if you burn in the drives to weed out the early failures, and you
> keep them running 24/7 in a nice environment they tend to last long
> enough to do the job. I tend to replace my drives at around ~30,000
> hours, so these have a long way to go yet.
You're one out of 100. Congratulations. :)
> On the other hand, I have my company data on Seagate Cheetah SAS drives
> in RAID-10, but I back up to the large WD Green arrays.
And that backup array may fail you when you need it most: during a
restore. Search the XFS archives for the horrific tale at University of
California Santa Cruz. The SA lost ~7TB of doctoral student research
data due to multiple WD20EARS drives in his primary storage arrays *and*
his D2D backup array dying in quick succession. IIRC multiple grad
students were forced to attend another semester to redo their
experiments and field work to recreate the lost data, so they could then
submit their theses.
How much did this incident cost the university and the Ph. D. students
in real money and lost time? I'm sure some actuaries might be able to
tell you, and the real cost is likely hundreds of thousands of times the
cost savings of using these crap drives, especially when you figure in
the lost salaries for 6 months of these Ph. D. students. Depending on
their field this could be over $100k per student. If such 10 students
were affected that's potentially $1 million in lost earnings alone.
Spending an additional $10-20K on proper disk drives would have saved an
enormous amount in this case, and not just purely money. If you were
one of the students who was told you had to repeat a semester because a
computer lost all of your research data, how would you digest and cope
with that? I'd bet at least one, if not more, lawsuits/settlements will
results from this.
Give that things like this can, and DO happen when banking on cheap
consumer drives in a production environment, why would anyone ever take
such a chance?
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On Sun, May 22, 2011 at 3:25 PM, Stan Hoeppner <stan [at] hardwarefreak.com>=
wrote:
>
> On 5/22/2011 5:09 AM, Brad Campbell wrote:
> > On 22/05/11 17:04, Stan Hoeppner wrote:
> >> It's difficult to align partitions properly on the Green drives du=
e to
> >> native 4K sectors translated by drive firmware to 512B sectors. =A0=
The
> >> Seagate SAS drive has native 512B sectors.
> >
> > Actually it's not difficult at all. You just make sure all your
> > partitions start on an even multiple of 8 sectors. No magic in it. =
Just
> > the same as all my SSD partitions start on 512k boundaries.
>
> IIRC from discussions here, mdadm has alignment issues with hybrid
> sector size drives when assembling raw disks. =A0Not everyone assembl=
es
> their md devices from partitions. =A0Many assemble raw devices.
Case in point: I have 4 of these 2TB Green drives in a RAID5 array. I
assembled them from the raw devices (no partition table) without any
special precautions. Am I in trouble? The array seems to be working
fine...
Tobias
--
Tobias McNulty, Managing Member
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig8965803BC1DA7ECEE7E78986
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
On 05/22/2011 10:57 PM, Tobias McNulty wrote:
> Case in point: I have 4 of these 2TB Green drives in a RAID5 array. I
> assembled them from the raw devices (no partition table) without any
> special precautions. Am I in trouble? The array seems to be working
> fine...
No, you aren't. If you don't create a partition table in the first
place, there's no possibility for partition boundaries to be mis-aligned
in regard to the physical sector or erase block size of the underlying
blockdevice. You could probably still get it wrong if you chose (if
that's even possible, I don't know for sure off-hand) a very weird
non-power-of-two chunk size that happens to interfere with the sector
size of your disks in a bad way, but since md's default chunk sizes are
rather large powers of two, you'd have to put some effort into screwing
up (if that is at all possible, as I mentioned before) ;)
--
with best regards:
- Johannes Truschnigg ( johannes [at] truschnigg.info )
www: http://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp: johannes [at] truschnigg.info
Please do not bother me with HTML-eMail or attachments. Thank you.
--------------enig8965803BC1DA7ECEE7E78986
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
iEYEARECAAYFAk3ZfIYACgkQnnUApj8OcoK8tgCgkDUEt2q6txcM46oRh1LF GHOw
U3MAnREVZCWcV80pabLp4+eETIavANrd
=w5Nt
-----END PGP SIGNATURE-----
--------------enig8965803BC1DA7ECEE7E78986--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 23/05/11 03:25, Stan Hoeppner wrote:
> On 5/22/2011 5:09 AM, Brad Campbell wrote:
>> On 22/05/11 17:04, Stan Hoeppner wrote:
>>
>>> WD's Green drives have a 5400 rpm 'variable' spindle speed. The Seagate
>>> 2.5" SAS drive has a 7.2k spindle speed.
>>
>> Actually, I'm pretty sure the WD drives have a 5400 rpm spindle speed
>> period. I've got 15 of them here and I have no evidence of any form of
>> spindle speed variation. They say the drives have spindle speed :
>> "intellipower" which is marketspeak for slow enough to save a few watts,
>> but fast enough to do the job.
>
> From: http://www.anandtech.com/show/2385/2
>
> The Western Digital drive's IntelliPower algorithm, which varies the
> rotational speed between 5400RPM and 7200RPM, dictates the Western
> Digital's rotational speed.
"In 2007 Western Digital announced the WD GP drive touting rotational
speed "between 7200 and 5400 rpm", which, if potentially misleading, is
technically correct; the drive spins at 5405 rpm, and the Green Power
spin speed is not variable.[citation needed]"
http://en.wikipedia.org/wiki/Western_Digital
They're not variable. Or to put it another way, if they _can_ vary the
spindle speed none of mine ever do.
Can you imagine the potential vibration nightmare as 10 drives vary
their spindle speed up and down? Not to mention the extra load on the
+12V rail and the delays while waiting for the platters to reach servo lock?
> IIRC from discussions here, mdadm has alignment issues with hybrid
> sector size drives when assembling raw disks. Not everyone assembles
> their md devices from partitions. Many assemble raw devices.
Which means the data starts at sector 0. That's an even multiple of 8.
Job done. (Mine are all assembled raw also).
> You must boot your server with MD-DOS or FreeDOS and run wdidle3 once
> for each Green drive in the system. But, IIRC, if the drives are
> connected via SAS expander or SATA PMP, this will not work. A direct
> connection to the HBA is required.
Indeed. In my workshop I have an old machine with 3 SATA hotswap bays
that allowed me to do 3 at once, booting off a USB key into DOS.
> Once one accounts for all the necessary labor and configuration
> contortions one must put himself through to make a Green drive into a
> 'regular' drive, it is often far more cost effective to buy 'regular'
> drives to begin with. This saves on labor $$ which is usually greater,
> from a total life cycle perspective, than the drive acquisition savings.
> The drives you end up with are already designed and tuned for the
> application. Reiterating Rudy's earlier point, using the Green drives
> in arrays is "penny wise, pound foolish".
>
I agree with you. If I were doing it again I'd spend some extra $$$ on
better drives, but I've already outlaid the cash and have a working array.
> Google WD20EARS and you'll find a 100:1 or more post ratio of problems
> vs praise for this drive. This is the original 2TB model which has
> shipped in much greater numbers into the marketplace than all other
> Green drives. Heck, simply search the archives of this list.
Indeed, but the same follows for almost any drive. People are quick to
voice their discontent but not so quick to praise something that does
what it says on the tin.
> And that backup array may fail you when you need it most: during a
> restore. Search the XFS archives for the horrific tale at University of
> California Santa Cruz. The SA lost ~7TB of doctoral student research
> data due to multiple WD20EARS drives in his primary storage arrays *and*
> his D2D backup array dying in quick succession. IIRC multiple grad
> students were forced to attend another semester to redo their
> experiments and field work to recreate the lost data, so they could then
> submit their theses.
>
Perhaps. Mine get a SMART short test every morning, a LONG every Sunday
and a complete array scrub every other Sunday. My critical backups are
also replicated to a WD World Edition Mybook that lives in another building.
I've had quite a few large arrays over the years, all comprised of the
cheapest available storage at the time. I've had drives fail, but aside
from a Sil3124 controller induced array failure I've never lost data
because of a cheap hard disk and I've saved many, many, many $$$ on drives.
I'm not arguing the penny wise, pound foolish sentiment. I'm just
stating my personal experience has been otherwise with drives.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 23/05/11 03:25, Stan Hoeppner wrote:
>
> And that backup array may fail you when you need it most: during a
> restore. Search the XFS archives for the horrific tale at University of
> California Santa Cruz. The SA lost ~7TB of doctoral student research
> data due to multiple WD20EARS drives in his primary storage arrays *and*
> his D2D backup array dying in quick succession. IIRC multiple grad
> students were forced to attend another semester to redo their
> experiments and field work to recreate the lost data, so they could then
> submit their theses.
So I "googled" that thread, and after I picked my way past all the top rating hits which appear to
be you telling people to google that thread I found the real problem.
He used WD commodity drives on a "hardware" RAID enclosure that needed TLER. The RAID-5 kicked out 4
drives in a short period of time, so he power cycled it and re-initialised the array and it came up
fine, but blank (as it would as he re-initialised it).
Sorry Stan, that's not a failure of the drives. He lost the data due to limitations in his RAID
configuration and bad management.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HBA Adaptor advice
On 23/05/11 07:44, Brad Campbell wrote:
> He used WD commodity drives on a "hardware" RAID enclosure that needed TLER. The RAID-5 kicked out 4
> drives in a short period of time, so he power cycled it and re-initialised the array and it came up
> fine, but blank (as it would as he re-initialised it).
>
Just to clarify that as it was somewhat muddled. The initial failure was on an unspecified array
with unspecified drives and resulted in a blank array. The backup failure was TLER related using WD
GP drives on a hardware array and was left unresolved.
That's still not concrete evidence of those drives failing, it's just using the wrong tool for the
wrong job.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html