Your efforts are noticeable and appreciated! It was always a mystery to me why the Windows console was stuck in time from the NT days, and your work sends a signal to all those doing sysadmin and programming work that Microsoft is supporting these use cases in Windows.
I found this talk to be really interesting, as SuperH chips were used in so many applications in the 90's. One significant application I'm familiar with were Roland synthesizers. It's very interesting that this group decided to implement this for their own application.
But, I'm curious why it wouldn't be easier to order SH CPU's from Renesas for modern applications, although I understand the implementation wouldn't be open all the way down? Does anyone have thoughts on this?
In other industries, such as petroleum refining, pipelines and nuclear power, their are structured methodologies for determining root causes. Some of these take into account equipment failure, and modelling is often done on equipment life cycles to determine replacement and inspection schedules in oil refineries.
These industries also employ strict management of change processes so that an ad-hoc decision or improvisation doesn't cause an incident.
What puzzles me is why you don't see these kind of practices applied in data center operations.
Are data centers really more complex.. then say.. a nuclear plant?
What puzzles me is why you don't see these kind of practices applied in data center operations.
Simply, people are not willing to pay the price. All of that extra planning requires more man hours. It also requires people, who come at significant cost, skilled at working in such environments. People want to use their Blackberries for hundreds of dollars up front and tens of dollars each month, not thousands of dollars monthly.
It can be done, but the market has determined that it does not want it to be done. It would rather accept some downtime and other problems in order to access the technology at a lower cost.
When I wrote my comment I wasn't thinking of lots of up-front planning. I was thinking along more simple lines like root cause analysis using a human factors or equipment taxonomy (much more affective then 5-whys).. and simple logging of incidents for later analysis.
I think some of these kinds of processes can be adopted with small investments in training and change.
Also, a lot of these kind of failures seem to stem from changes at the networking layer.. which should be more planned and tested given their place in the stack (we're not talking about crazy app behaviour).
You'd better believe that companies like Amazon, RIM, and all the others have very strict change management procedures. Changes to critical systems are generally planned weeks - sometimes months - in advance, and even simple changes often require multiple levels of approval. Even then, sometimes something gets missed, a component acts slightly differently than expected, or somebody makes a mistake.
All of the mistakes the article discussed were network related, and yes, when you're dealing with networks like Amazon's or RIM's, where there are hundreds of thousands or even millions of devices connected, they are definitely more complicated than a nuclear plant.
Arguably so, in that app-level data has a richer set of failure modes than even highly-complex physical materials and flows, but I don't think that detracts at all from your main point.
The linked article used the word "unknowable" to describe the failure modes in a complex datacenter. They're definitely "unknown," but it's way too early to just give up on figuring many more of them out.
If a CTO says "it'll cost $50B to reach less than 5% unknown failure modes," that's one thing. But if a CTO says "yeah, the faulre modes of our critical infrastructure are unknowable" they better start looking for a new gig.
My three year-old daughter uses the same technique, she has no idea who Toyota is. Thankfully the root cause usually ends up being "BECAUSE I SAID SO".
Yes, data centers are more complex than Nuclear power plants. Nuclear power plants generally operate in well-known ways and with few unknown interactions.
eg.
the work load on a nuclear power plant system does not spike around Christmas
the work load on a nuclear power plant system does not change when there is a video spreading virally
the nuclear power plant control system does not have to support a diverse eco-system of clients, some of them running hostile code.
the nuclear power plant control system does not need to be upgraded very often
Modern data centres and the infrastructure that runs on them operate in an environment of constant flux, this makes them vastly more complicated than a nuclear power plant system.
Nuclear power plants have a vast number of components and a few very catastrophic failure modes but in essence they're very simple. The components don't change (save for the well known ways in which metal degrades under neutron bombardment), and lots of money has been invested to know exactly how these components act under various stresses. The engineers who operate them operate the same system for years and years becoming experts on how their very well defined system operates and know intimately its exact failure modes, and have a SOP manual and various checklists to quickly pinpoint and eliminate problems.
Nuclear power plants are also developed and operated under a very clear singly focused mandate: Prevent at any cost the Nuclear Power plant from entering a catastrophic failure mode.
I think the conceptual switch in licensing model is fair.. from cpu-cores to vRAM entitlement.. But the vRAM allocations per license are not right. It puts sysadmins in a real bind.. having to report bad news to mgmt.
They need to to the right thing and adjust the vRAM untitlements. Sad thing is.. people are so locked in to VMWare infrastructure that they'll likely make money short term, at the expense of pissing of customers. Oracle plays this game too..
Their marketing materials say this is much easier but it doesn't seem easier to me. Right now, we have a three node Enterprise Plus, six CPU cluster. This is six licenses period.
Now, I have to track vRAM usage and decide to purchase EP licenses for all available RAM or just go with usage + growth for the year...or something. I'm not sure how that's easier.
A counter argument to this was Carlos Ghosn's experience with Nissan. He said that one of Nissan's problems was engineers making cars for engineers, but that people did not want to buy.
In reality creating, marketing and being successful with a product requires diverse perspectives.
I think engineers beating up on MBA's is no different from MBA's beating up on "propellerheads." It's counter productive.
And just because you have an MBA, should you then be typecast by engineers? Why does having an MBA reduce someone's potential to be a passionate product person, or to have productive relationships with the makers?
The problem is that the old business school courses taught a lot of flawed stuff, including stuff from Jack Welch like cutting costs by treating engineers etc as disposable.
That's a good point, and poor cost accounting practices..
I have a friend (an engineer) who is doing his MBA right now. At his school he said there is a big focus on ethics and ethical behaviour. Based on what I've witnessed in business, I wish some of that was maintained once people entered the work world!
One of the reasons I ditched my Torch was I would often get "java.lang.NullPointerExceptions" dialogs when I received SMS's. Pretty bad from a user experience standpoint.
Also lots of other quirkiness in the OS drove me batty.
BTW, the RIM webkit browser implementation is excellent, just the horsepower of the phone letting things down. And I miss the physical keyboard!
BES is yet another piece of software to manage if you are a mail administrator, and it is a pretty complex beast at that. You can have just as secure a sync solution with straight iphone/exchange by using Exchange Activesync with a VPN client, and you don't have the overhead of managing a BES install.
I don't think you are understanding my point. Nobody gives a damn what the Mail Administrator thinks. The auditors see "BES" and move to the next item, they see your custom made solution and go "WTF is that? Not in the manual == Security Risk!!!11"
This reminds me of when Microsoft was hyping touch user interface features in the run-up to the Windows 7 launch (without anything tangible in the field). Then Apple released multi-touch gesture pads on their laptops and furthered the reach of iOS.