Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Helium Factor and Hard Drive Failure Rates (backblaze.com)
147 points by ingve on May 3, 2018 | hide | past | favorite | 61 comments


You'd certainly expect the long-term failure for Helium filled to be lower. By displacing (moist) air, you've completely stopped any metal corrosion. Helium is also highly conductive to heat, so you will even out any hot spots.

Does anyone know to what sort of pressure helium the drives are filled?


My thought when first hearing about them was that, having an extra point of failure (Helium leakage), they would have higher long-term failure rates.


If that leakage only brings the probability of failure back to previous rates, then you've still lowered the failure rate overall.


No, that's not what would happen. Helium drives enable higher data density, because helium decreases the amount of drag, which in turn reduces turbulence, platter vibration, and heat. This enables more more platters per drive and more accurate head positioning.

Helium leakage would make the drive substantially less reliable than air drives due to the higher data density.


But they're not comparing denser helium-filled drives to less-dense air-filled drives. They're comparing 12TB helium drives to 12TB air drives—equal density. (I have no idea how they're managing to make non-helium drives of that density, but apparently they can.)

Of course, for a drive density that can only be achieved using helium, a helium leak will kill the drive.

But right now, it seems that these drives might not actually need the helium, since equivalent drives exist without it.


FWIW a quick search suggests the 8TB drives have the same RPM too, so it's not like the helium-drives are running faster just because they can. (Helium-filled HUH728080ALE600 versus air-filled ST8000DM002.)


Technically, you could choose a 3.5 inch, 6 Gb/s, 4 Tb model that contains He and then compare to the non-He version.


I don't know for a fact about the pressure, but I assume it's at atmospheric pressure, to make sealing and such less of an issue than making the HDD a pressure vessel.


Helium gas being conductive? Usually fluid cooling works by convection, isn't conduction only for solids?


Fluid dynamicist here.

"Convective" heat transfer includes heat transfer by both advection (the bulk fluid movement) and conduction. You might think convection is just advection, and I will admit the terminology is used inconsistently.


Erm, I'm a lapsed physics graduate.

Thanks for the correction, I was also watching PBS Space Time on Red Dwarfs, and how small stars cool the core by convection. Larger stars use radiation in the core and so lose access to a majority of the mass of the star which could have been used for fuel.


Is the thermal conductivity of helium actually relevant to hard drives? Like what kind of hot spots and transfer rates are we talking about?


FWIW utility generators are cooled with hydrogen to reduce windage losses and promote cooling. Hydrogen is probably lower viscosity and higher thermal conductivity than helium. It's also very hard to contain due to the small molecular size and has the widest flammability range of any gas with which I'm familiar, making it an interesting gas to work with. ;)


Isn't there also a potential problem with hydrogen embrittlement (depending on what materials are exposed) that you avoid with helium?


I don't know, but it seems possible. The only references I recall about hydrogen embrittlement were with steam boilers at high temperature and pressure. I suppose like most engineering solutions it comes down to the various tradeoffs between cost and effectiveness for hydrogen vs. helium.


The utility generators probably don't worry too much about leaks, since they can just refill with new, abundant hydrogen, too.

It would be amusing to have a hydrogen-cooled hard drive for which you'd have to regularly refill the water tank so that it could generate more hydrogen for itself. :)


Yeah, and it would have a little Bunsen burner to safely get rid of waste hydrogen!


Yes, it's the hydrogen that leaks out that's the problem. I was told that the way to find hydrogen leaks in plant piping was to walk the line holding a corn broom over the line. When it bursts into flame you found the leak. In an enclosed space it can form an explosive mix with air if it doesn't ignite before it accumulates.


He has a conductivity coefficent K (W/mK) = .151 and air = .026. Since everything else in heat flux eq is linear, approximately 5x rate of heat movement per degree that the HD is heated beyond surrounding air is expected.


Yeah, I can read a table of figures too, but what does that mean in the context of a hard drive? What percentage of the heat generated by the write heads (or whatever) is conducted away via air?


That is a function of the design implemented. Since you can read a table of figures you can probably also plug them in the heat flux equation in with your own guesses at air gap distances and surface area. The relevant info is that 5 times the energy per degree difference are conducted, that is why any hot spots from friction are smoothed.


If you model drive failure as a Poisson Process (each drive flips a coin each day with probability of failure x), then the measurements imply air and helium drives both have about x=2.8e-5. But drive failure rate may not be constant over time, especially if the failure is wear/aging related. The author should consider fitting the data to a https://en.m.wikipedia.org/wiki/Weibull_distribution with k>1.

In particular, k=2 is a model where failure rate increases linearly over time. To estimate the instrinsic drive failure rate for this model, we must look at the raw failure times (not enough info in blog's totals table) and compute the root-mean-square of time between failures. Then divide this by the total number of drives. Do this for the two classes (air vs helium) and see which is better.


> If you model drive failure as a Poisson Process (each drive flips a coin each day with probability of failure x)

IIRC the Poisson distribution applies when the rate is roughly constant per interval of time, which is not exactly the same as a probability of x per interval of time. (Failures are independent in the latter, not in the former).


Please see Poisson Process, which is different than Poisson distribution, http://www.randomservices.org/random/poisson/index.html .. failures are indeed independent.


Given the amount of engineering required for drives which are able to contain helium, one would expect that the required tolerances and materials would be much better. There may be reliability advantages for Helium, but I suspect greater care and QA during manufacturing will account for most of the improvement.

Also, it would be very bad PR for a new type of drive if they performed significantly worse.


I might have expected the first-order effect to be that helium drives run cooler, and therefore last longer. Most failure mode rates have an eᵏᵀ term, doubling every 10-15 °C. But Google's large study didn't show much of that at reasonable temperatures (below 45 °C): https://static.googleusercontent.com/media/research.google.c...


I absolutely expect helium drives to run cooler simply because helium is a much better conductor of heat compared to air. As a result all heat generated in the drive cavity will be more effectively conducted to the containment vessel and leave the material inside consistently cooler.


Unfortunately air is an excellent insulator. I looked it up and helium has thermal conductivity and heat capacity about 5 times higher than air.

So significantly better but we have to remember that solids have something like a thousand times better thermal properties than gases and it is the solid bits we are mostly concerned with. So it really depends on where the heat is originating in the drive. I suspect the big advantage is the lower viscosity of helium which prevents heat generation in the drive cavity in the first place.


That's a two way street though. If the ambient is higher it will carry the heat from the outside into places in the drive where it can do real damage.


For an excessively hot exterior sure, but I was thinking back to an experience at NetApp when thermal inside was the issue. When NetApp first started using SATA drives (rather than Fiber Channel) in their Nearstore appliance we discovered that the SATA drives were negatively effected by writing too much. Specifically, if write duty cycle was too high, the r/w head got too hot, and the heat caused it to change shape slightly and that shape change caused it to fly higher. Leading to something that was dubbed 'high fly writes' which splattered into adjacent tracks and could corrupt data. One fix was to keep track of the write duty cycle and to let the drive 'rest' for a bit to keep the overall duty cycle into something the drive could handle.

I don't think that sort of failure can happen on a helium drive.


You're right about that, in a drive filled with helium inside the drive the temperature gradients will be flatter.

Heat transfer in computing hardware is a super interesting subject.


The drive produces heat, so it will always be hotter than ambient (proportional to the heat generated times the thermal resistance).


Not necessarily, the drive produces heat but something around it (other drives, cpu's) may be producing a lot more heat than the drive raising the temperature of the drive casing beyond where it would be if it were just the drive in isolation.

Case design for lots of drives + a beefy CPU is very tricky if you want to really balance the airflow so that there are no large local variations in temperature.


Pretty interesting. I like how the author talks about how it may be too early to get useful data yet since their non-helium drives have been online for much longer.

It'd be interesting to look at the numbers again in two years and see if the guess is correct.


Yev from Backblaze here -> we're going to keep an eye on them and follow up on the stats so that we can eventually get there!


If the AFR numbers follow some bathtub-shaped curve, would it make sense to use that curve to "normalize" the failure rates according to drive days?

I always wonder if the rates we see for each drives are really comparable, since they all have different ages, and whether that's representative of the average lifetime AFR.

That might be an interesting column to add to the reports!


This might be interesting -> https://hackernoon.com/applying-medical-statistics-to-the-ba... it's more "time based" - a medical statistical model applied to our drive stats!


In the article it states that the helium can typically reduce drag by 20%. How much of an effect does that have on the cost of powering the drive? I assume it would at least be less than a 20% improvement overall.


Yev from Backblaze here -> You'd have to look at the entire cabinet to get a better sense of the power savings. Post author Andy explains a bit more here -> http://disq.us/p/1s8psup.


Maybe this is a stupid question, but why don't manufacturers evacuate the drives (i.e. create a vacuum) instead of filling them with Helium?


It's much easier physically to build something which can hold e.g. +0.1 atmosphere helium, than resist imploding against -1.0 atmosphere. Practically speaking, pumping down to vacuum is a slow process, whereas you can flush with an inert gas. Finally, hard-drive heads actually float on lamina of gas; this is one of the reasons that normal open-circuit hard drives have a maximum operating altitude.


Wait, if it's +0.1 atm helium (for example), what happens if it develops a slight leak, a leak from which helium molecules can escape but air molecules cannot? once the pressure reaches equilibrium, there would be no reason for additional helium to escape, and because air can't enter.... then what?


That isn't how it works though--look up partial pressures of gases. Helium will continue to escape until its partial pressure is at equilibrium with the external environment, i.e. near to zero, and the external gases will enter until they are at equilibrium with the internal environment. At that point the partial pressure of each gas will be equal on both sides.


What happens to these helium hard drives if you repeatedly cycle the pressure, like by taking the hard drive on an airplane?


Since the partial pressure of helium at ground level and flight level is essentially the same (≈0) there should be very little change, providing the seals are intact. Since they are very securely sealed to contain the helium, changes to the external pressure, unless extreme, should not have any significant effect.


He states that they see one drive that is between 94-99%, but that may be a sensor issue. You'd lose the power savings.


Nit: those are not percentages indicating how much helium left, just some undocumented SMART attribute "raw values", there is no direct way to find out their meaning. (This is what the article text implies too).

The raw attribute values could be for example 100 = good 99 = moderate 98 = degraded, or a log scale similar to decibels etc. Or it could be a bitfield that has no meaning as a decimal number. etc


would that still be a problem if you pulled a vacuum outside the drive as well? like put the whole thing in a chamber?


Duskwuff has it right: the air or other gas is a critical component of the system. The spinning platters pull the gas along with them, and the head assembly is designed so that its aerodynamics control the "flying height" [0], the distance between the head and the platter. In a vacuum, there would be no easy way to keep the head from coming into contact with the platter, which is what causes head crashes. (In the era of sealed drives, head crashes are quite rare, but they used to be much more common. In a head crash, the head rips the magnetic coating off the platter, destroying it.)

[0] https://en.wikipedia.org/wiki/Flying_height


You are exactly right. WD explains the same in https://blog.westerndigital.com/rise-helium-drives/ :

«Without air, the heads will crash into the disk»


The head flying height is controlled by air flow (or helium flow, I suppose) under the head. With no atmosphere, there's no way to regulate that gap reliably.


That is definitely a question that a manufacturer should answer :-/ No idea!


have there been any drives designed with an array of fixed read / write heads i.e. one per track or a head that can access multiple tracks on a platter simultaniously?


There have been a few. Here’s an older article https://www.tomshardware.com/news/seagate-hdd-harddrive,8279...


Just like in combustion engines, it seems like the best things are held to the last generations of hardware. I expect both to lose market share down to nil during the 2020s. BEVs replacing internal combustion engines and solid state storage replacing HDDs even in cost per storage unit.


"Perfection is attained on the point of collapse"

(I thought I had read that somewhere, but the closest I find now is C.N. Parkinson's formulation, which I did read years ago ...)


ever tried pulling a vacuum on a normal drive? does that work?


Air is required for the heads to function. While the disk's internal environment is separate from the outside air to keep it clean, air exchange is permitted between the outside and inside of the drive to allow the drive to adjust to changes in air pressure caused by thermal expansion. A special "breather" filter is installed to prevent foreign matter from contaminating the drive.


The arm of the drive is a tuned mass damper attached to a voice coil that utilizes ground effect to maintain the a precise distance between the head and the platter. Spinning disks wouldn't work in a vacuum using current technology, because there'd be no ground effect to lift the arm off the platter.


I did some everesting and our organiser had tired regular laptops and they always packed up through disk crashes. I guess that would be at about 16 to 20,000 feet, or about half sea level pressure. That was in pre SSD days.


Helium also has higher viscosity than air and because of that the flying head in the drive is more stable than in air.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: