shivak's comments

shivak · on Nov 21, 2024

> > The power shelf distributes DC power up and down the rack via a bus bar. This eliminates the 70 total AC power supplies found in an equivalent legacy server rack within 32 servers, two top-of-rack switches, and one out-of-band switch, each with two AC power supplies

This creates a single point of failure, trading robustness for efficiency. There's nothing wrong with that, but software/ops might have to accommodate by making the opposite tradeoff. In general, the cost savings advertised by cloud infrastructure should be more holistic.

dralley · on Nov 21, 2024

>This creates a single point of failure, trading robustness for efficiency. There's nothing wrong with that, but software/ops might have to accommodate by making the opposite tradeoff.

I'll happily take a single high qualify power supply (which may have internal redundancy FWIW) over 70 much more cheaply made power supplies that stress other parts of my datacenter via sheer inefficiency, and also costs more in aggregate. Nobody drives down the highway with 10 spare tires for their SUV.

shivak · on Nov 21, 2024

A DC busbar can propagate a short circuit across the rack, and DC circuit protection is harder than AC. So of course each server now needs its own current limiter, or a cheap fuse.

But I’m not debating the merits of this engineering tradeoff - which seems fine, and pretty widely adopted - just its advertisement. The healthcare industry understands the importance of assessing clinical endpoints (like mortality) rather than surrogate measures (like lab results). Whenever we replace “legacy” with “cloud”, it’d be nice to estimate the change in TCO.

malfist · on Nov 21, 2024

DC circuit protection is absolutely not harder than AC. DC has the advantage in current flowing in only one direction, not two

paddy_m · on Nov 22, 2024

Which makes it much harder to break the circuit vs AC

wbl · on Nov 22, 2024

At 48 volts arcing shorts aren't the concern.

fracus · on Nov 21, 2024

No one drives down the highway with one tire either.

AcerbicZero · on Nov 21, 2024

Careful, unicyclists are an unforgiving bunch.

hn-throw · on Nov 21, 2024

Let's say your high quality supply's yearly failure rate is 100 times less than the cheap ones

The probability of at least a single failure is 1-(1-r)^70.

This is quite high even w/out considering the higher quality of the one supply.

The probability of all 70 going down is

r^70 which is absurdly low.

Let's say r = 0.05 or one failed supply every 20 in a year.

1-(1-r)^70 = 97% r^70 < 1E-91

The high quality supply has r = 0.0005, in between no failure and all failing. If you code can handle node failure, very many, cheaper supplies appears to be more robust.

(Assuming uncorrelated events. YMMV)

lillecarl · on Nov 21, 2024

Yeah but the failure rate of an analog piece of copper is pretty low, it'll keep being copper unless you do stupid things. You'll have multiple power supplies provide power on the same piece of copper

hn-throw · on Nov 21, 2024

TL/DR, isnt there a single, shared, DC supply that supplies said piece of copper? Presumably connected to mains?

Or are the running on SOFCs?

mycoliza · on Nov 21, 2024

The big piece of copper is fed by redundant rectifiers. Each power shelf has six independent rectifiers which are 5+1 redundant if the rack is fully loaded with compute sleds, or 3+3 redundant if the rack is half-populated. Customers who want more redundancy can also have a second power shelf with six more rectifiers.

hn-throw · on Nov 22, 2024

I'm going to assume this is on 3 phase power, but how is the ripple filtered?

applied_heat · on Nov 22, 2024

Inductors and capacitors usually

sunshowers · on Nov 21, 2024

Look very carefully at the picture of the rack at https://oxide.computer/ :) there are two power shelves in the middle, not one.

We're absolutely aware of the tradeoffs here and have made quite considered decisions!

jsolson · on Nov 21, 2024

The bus bar itself is an SPoF, but it's also just dumb copper. That doesn't mean that nothing can go wrong, but it's pretty far into the tail of the failure distribution.

The power shelf that keeps the busbar fed will have multiple rectifiers, often with at least N+1 redundancy so that you can have a rectifier fail and swap it without the rack itself failing. Similar things apply to the battery shelves.

immibis · on Nov 21, 2024

It's also plausible to have multiple power supplies feeding the same bus bar in parallel (if they're designed to support this) e.g. one at each end of a row.

eaasen · on Nov 21, 2024

This is how our rack works (Oxide employee). In each power shelf, there are 6 power supplies and only 5 need to be functional to run at full load. If you want even more redundancy, you can use both power shelves with independent power feeds to each so even if you lose a feed, the rack still has 5+1 redundant power supplies.

walrus01 · on Nov 21, 2024

The whole thing with eliminating 70 discrete 1U server size AC-to-DC power supplies is nothing new. It's the same general concept as the power distribution unit in the center of an open compute platform rack design from 10+ years ago.

Everyone who's doing serious datacenter stuff at scale knows that one of the absolute least efficient, labor intensive and cabling intensive/annoying ways of powering stuff is to have something like a 42U cabinet with 36 servers in it, each of them with dual power supplies, with power leads going to a pair of 208V 30A vertical PDUs in the rear of the cabinet. It gets ugly fast in terms of efficiency.

The single point of failure isn't really a problem as long as the software is architected to be tolerant of the disappearance of an entire node (mapping to a single motherboard that is a single or dual cpu socket config with a ton of DDR4 on it).

formerly_proven · on Nov 21, 2024

That’s one reason why 2U4N systems are kinda popular. 1/4 the cabling in legacy infrastructure.

jeffbee · on Nov 21, 2024

PDUs are also very failure-prone and not worth the trouble.

sidewndr46 · on Nov 21, 2024

This isn't even remotely close. Unless all 32 servers have redundant AC power feeds present, you've traded one single point of failure for another single point of failure.

In the event that all 32 servers had redundant AC power feeds, you could just install a pair of redundant DC power feeds.

gruez · on Nov 21, 2024

>Unless all 32 servers have redundant AC power feeds present, you've traded one single point of failure for another single point of failure.

Is this not standard? I vaguely remember that rack severs typically have two PSUs for this reason.

glitchcrab · on Nov 21, 2024

It's highly dependent on the individual server model and quite often how you spec it too. Most 1U Dell machines I worked with in the past only had a single slot for a PSU, whereas the beefier 2U (and above) machines generally came with 2 PSUs.

thfuran · on Nov 21, 2024

But 2 PSUs plugged into the same AC supply still have a single point of failure.

glitchcrab · on Nov 22, 2024

Which is why you have two separate PDUs in the rack which are fed by different power feeds and you connect the server's 2 PSUs to opposing PDUs.

growse · on Nov 22, 2024

This works brilliantly, right up to the point where your A side fails, and every single server suddenly doubles their demand on B.

Better have good capacity management so you don't go over 100% on B when that happens! (I've seen it happen and take a DC out).

jeffbee · on Nov 21, 2024

Rack servers have two PSUs because enterprise buyers are gullible and will buy anything. Generally what happens in case of a single PSU failure is the other PSU also fails or it asserts PROCHOT which means instead of a clean hard down server you have a slow server derping along at 400MHz which is worse in every possible way.

sidewndr46 · on Nov 21, 2024

you could have 15 PSUs in a server. It doesn't mean they have redundant power feeds

MisterTea · on Nov 21, 2024

> This creates a single point of failure,

Who told you there is only one PSU in the power shelf?

shivak · on Feb 17, 2022

As you noted, Apple's fsync() behavior is defensible if PLP is assumed. Committing through the PLP cache isn't how these drives are meant to operate - hence the poor behavior of F_FULLSYNC.

But this isn't specific to Macs and iDevices. Some non-PLP drives also struggle with sync writes on FreeBSD [1]. Most enterprises running RDBMS mandate PLP for both performance and reliability. I understand why this is frustrating for porting Linux, but Apple is allowed to make strong assumptions about how their hardware interoperates.

[1] https://www.truenas.com/community/threads/slog-and-power-los...

marcan_42 · on Feb 18, 2022

There is no PLP. If you yank power you lose up to 5-10 seconds of disk cache (fsynced files that weren't F_FULLFSYNCed). I tested this. On macOS.

shivak · on Feb 18, 2022

I guess we expected a marvelous interplay of hardware and software, but all we got was fudged numbers.

shivak · on Oct 8, 2021

The first problem is rendering within the guest. If you only have one GPU, then GVT-g [1] virtualizes it with just a bit of overhead. But it's Intel only.

The second problem is getting those pixels onto your screen in the host. SPICE is not as fast as Looking Glass [2], which sets up a shared memory buffer between the host and guest. This has acceptable performance even for modern games.

The OP doesn't seem to utilize these techniques, so I don't think it can plausibly claim to have the fastest configuration - at least not yet.

[1] https://wiki.archlinux.org/title/Intel_GVT-g

[2] https://looking-glass.io

throwaway888abc · on Oct 8, 2021

Thanks! TIL [1,2]

shivak · on Aug 5, 2021

> recruited mathematicians to analyze it, and published the results, as well as one in-house proof and one independent proof showing the cryptographic integrity of the system.

Apple employs cryptographers, but they are not necessarily acting in your interest. Case in point: their use of private set intersection, to preserve privacy..of law enforcement, not users. Their less technical summary:

> Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child safety organizations. Apple further transforms this database into an unreadable set of hashes that is securely stored on users’ devices.

> Before an image is stored in iCloud Photos, an on-device matching process is performed for that image against the known CSAM hashes. This matching process is powered by a cryptographic technology called private set intersection..

The matching is performed on device, so the user’s privacy isn’t at stake. But, thanks to PSI and the hash preprocessing, the user doesn’t know what law enforcement is looking for.

xondono · on Aug 5, 2021

Well, it’d be kind of dumb to make the mistake of building a system to stop child pornography only to have it become the biggest distributor of CP photos in history

shivak · on Aug 5, 2021

Those images are hashed, not transmitted in original format. On top of that, PSI prevents you from learning those hashes, or how many there are. So you can’t tell if the database contains the hash of, say, tank-man.jpg.

I understand why this shielding is necessary for the system to work. My point is the crypto is being used to protect law enforcement, not the user.

xondono · on Aug 6, 2021

And my point is that the only way to provide visibility over what is being looked without distributing the material would be to implement some type of ZKP

shivak · on March 17, 2021

Compiling and installing large amounts of system software, a la `emerge world` or `make buildworld`, is great exposure to many system components. `make menuconfig` introduces one to various features of the Linux kernel, and yes, even a humble `./configure` illustrates how the software in question depends on libraries and hardware. I wouldn't casually dismiss the educational value of these experiences, nor the curiosity of those partaking. They're certainly more expository than the digests displayed in a `docker pull`.

shivak · on Aug 24, 2018

I hoped WebAuthn would allow such general-purpose use of security devices. Unfortunately, in the spec: "To save bandwidth and processing requirements on the authenticator, the client hashes the client data and sends only the result to the authenticator. The authenticator signs over the combination of the hash of the serialized client data, and its own authenticator data." [1]

So, in the foreseeable future on the web, the devices are useful just for authentication.

[1] https://w3c.github.io/webauthn/

tptacek · on Aug 24, 2018

You understand that in addition to simplifying the protocol, confining it to authentication helps, somewhat, keep Webauthn from becoming essentially a hardware cookie.

shivak · on Aug 24, 2018

I think that’s mitigated by the physical button one has to press every time something is signed.

I’m not sure if the restriction to authentication has substantially simplified the WebAuthn API. The restriction is caused by a speed optimization, not design simplification. If the actual payload was sent to the authenticator, rather than just its hash due to bandwidth limitations, then it seems like the API could be used for signing messages, not just authentication. I do agree that the user interfaces surrounding the APIs will be simpler due to the focus on authentication.

shivak · on Oct 1, 2017

He struggled deeply with alcoholism, but the cause of his death has not been announced.

AnkhMorporkian · on Oct 1, 2017

I cannot confirm this in any way, but one of my Russian friends told me he had Cirrhosis and was being treated for it.

downandout · on Oct 1, 2017

It’s a frightening testament to the power of addiction that even some of the smartest, most logical people in the world cannot control some of their own extremely illogical decisions.

sillysaurus3 · on Oct 1, 2017

On the other hand, lacking proof this comment chain amounts to nothing more than a smear.

"Vladimir Voevodsky alcohol" gives zero results.

Can we let one person's death be a little respectful? Just once.

dsacco · on Oct 1, 2017

I think you're being pretty hyperbolic in your reaction here. A lot of people like to indulge in their morbid curiosity, and they're free to do so. If that means aimless speculation about the cause of someone's death as a means of rationalizing it then so be it. It's not really your place to be the arbiter of someone's death and others' reactions to it.

A friend of mine died when he was only 24 due to a sudden brain aneurysm in his sleep, and in that same year I had two other friends commit suicide (in fairness one was questionable, but it was by a train). I grappled with that for well over a year, to the point that it caused me a great deal of anxiety and insomnia for someone apparently healthy to die that young. It fundamentally crushed a lot of my own personal sense of control and security. One of the most difficult things for me to process was the utter lack of information - how did he die? Did he have chronic headaches and didn't check? Is there a way I could consider him accountable for it? Something to understand the situation would give me greater closure than the reality that people simply die, sometimes quite suddenly, for no predictable reason at all.

This is an extreme example for me to give, but the message is this: I understand that you're upset by this individual dying and the way others reacted to it (take a look at a comment I made here when John Nash died in a car accident in 2015: https://news.ycombinator.com/item?id=9597349). But what you're doing is policing other peoples' emotions, and they have a right to react to the death by speculating about its causes. That is a natural and very common response that you cannot artificially remove.

sillysaurus3 · on Oct 1, 2017

Yeah, I agree. I'm not thinking clearly right now. My wisdom tooth appears to be coming in. Or something. The right half of my face is swollen and this pain is ~excruciating. Hey, at least my face looks the way my comments sounded, right?

Thanks for putting this into perspective. My comments in this post are pretty embarrassing. I wish I could delete them. But mistakes are too easy to run from. Apologies regardless.

It was quite selfish to make a ruckus like that, so I think I was the disrespectful one. The focus should be on Vladimir and his life.

agumonkey · on Oct 1, 2017

Just for the sake of it, I may say that first I understand your concern, but whatever takes away someone, might be a source of shame, disrespect or lack of care for the parent poster. Maybe the opposite even.

Regards

shivak · on March 3, 2013

I disagree thoroughly. I've been to both Harvard and CMU, and assisted classes at the latter. CMU SCS undergrads work much harder than Harvard undergrads, and their work is more honest.

Actually, CMU students are perhaps a bit too isolated. This leads to some academic stratification, because the smart kids hang out with one another. There is less support and camaraderie. Lots of promising students struggle at SCS and drop out. The attrition rate, not cheating, is actually the primary academic concern.

Harvard is at the other extreme. The stratification isn't academic, it's social (via finals clubs and such.) Everyone collaborates. There are two common practices I find especially distasteful. Harvard has a very long, class-free study period right before exams. Also, Harvard provides students with Adderall at no cost with essentially no questions asked. A lot of students blow off the assignments then cram with loads of amphetamines. They're smart, so they succeed, but they don't really learn anything.

bane · on March 3, 2013

"Harvard provides students with Adderall at no cost with essentially no questions asked."

Can you elaborate on what this means? Can you pretty much just roll into the nurse's office and get some amphetamines?

shivak · on Sept 7, 2011

His description of Bitcoin is technically incorrect. As described in the original Bitcoin paper, workers are not necessarily "miners" and may be funded by transaction fees.

Bitcoin isn't susceptible to hoarding because, unlike gold, it has no intrinsic value. An alternative explanation for the rise in value: the Bitcoin ecosystem is under heavy development. The value of Bitcoins is possibly soaring in anticipation of this infrastructure i.e. its utility as a currency.

I wouldn't rely on Krugman's blog for serious analysis.

Robin_Message · on Sept 7, 2011

The monetary value of gold is in excess of its intrinsic value — exactly like Bitcoin. It's only a question of degree. And "shiny things" as a motivation for mining explains gold and Bitcoins.

shivak · on Sept 7, 2011

Yes, but the reason why is not exactly the same. That point is lost on Krugman, who equates gold and Bitcoin on that basis.