> Well, you can buy your own hardware and set it up with OpenStack and use it as a private cloud. Companies like Canonical or Redhat make a lot of money by providing software (mostly open source) to support exactly that use case.
Sure you can, but then who will diagnose and fix your hardware/OS interaction problems when you have parts from five vendors in the mix?
If you haven't lived through this, the answer is: nobody. Everyone points fingers at the other 4 and ignore your calls.
Back in the day you could buy a fully integrated system (from CPU to hardware to OS) from Sun or SGI or HP and you had a single company to answer all the calls, so it was much better. Today you can't really get this level of integration and support anymore.
(Actually, you probably can from IBM, which is why they're still around. But I have no experience in the IBM universe.)
This is why Oxide is so exciting to me. I hope I can be in a company that becomes a customer at some point.
>Sure you can, but then who will diagnose and fix your hardware/OS interaction problems when you have parts from five vendors in the mix?
Dell is a single vendor that will diagnose and fix all of your hardware issues.
With Oxide you're locked into what looks like a Solaris derivative OS running on the metal and you're only allowed to provision VMs which is a huge disadvantage.
I run a fleet of over 30,000 nodes in three continents and the majority is Flatcar Linux running on bare metal. Also have a decent amount of RHEL running for specific apps. We can pick and choose our bare metal OS which is something you cannot do with Oxide. That's a tough pill to swallow.
> Dell is a single vendor that will diagnose and fix all of your hardware issues.
I've been a Dell customer at a previous company. I know for a fact that's not true.
I had a support ticket for a weird firmware bug open for two years, they could never figure it out. I left that job but for all I know the case is still open many years later.
Dell doesn't know how to fix things like that because they don't design and engineer the systems they sell. Dell is a reseller who puts components together from a bunch of vendors and it mostly works but when it doesn't, there's nobody on staff who can fix it.
I've been a Dell customer for decades at this rate and I know for a fact it's true.
I've had support tickets open for all kinda of weird firmware, hardware, etc. bugs and they've been well resolved, even if it meant Dell just replaced the part with something comparable (NIC swap).
>Dell doesn't know how to fix things like that because they don't design and engineer the systems they sell.
Of course they do. That's like saying Oxide doesn't know how to fix stuff because they don't design the CPU, NVMe, DIMMs, etc. Oxide is still going to vendors for these things.
Ironically, it was Dell's total inability to resolve a pathological rash of uncorrectable memory errors very much is part of the origin story of Oxide: this issue was very important to my employer (who was a galactic Dell customer) and as the issue endured and Dell escalated internally, it became increasingly clear that there was in fact no one at Dell who could help us -- Dell did not understand how their own systems work.
At Oxide, we have been deliberate at every step, designing from first principles whenever possible. (We -- unlike essentially everyone else -- did not simply iterate from a reference design.)
To make this concrete with respect to the CPU in particular, we have done our own lowest-level platform enablement software[0] -- we have no BIOS. No one -- not the hyperscalers, not the ODMs and certainly not Dell -- has done this, and even AMD didn't think we could pull it off. Why did we do it this way? Because all along our lodestar was that problem that Dell was useless to us on -- that we wanted to understand these systems from first principles, because we have felt that that is essential to deliver the product that we ourselves wanted to by.
There are plenty of valid criticisms of Oxide -- but that we don't understand our system simply isn't one of them.
As a side question, what's the name of your custom firmware that is the replacement of the AGESA bootloader? I tried searching on the oxide github page but couldn't find anything that seemed to fit that description.
(The AGESA bootloader -- or ABL -- is in the AMD PSP.) In terms of our replacement for AGESA: the PSP boots to our first instruction, which is the pico host bootloader, phbl[0]. phbl then loads the actual operating system[1], which performs platform enablement as part of booting. (This is pretty involved, but to give you a flavor, see, e.g. initialization of the DXIO engine.[2])
Thanks, are the important oxide branches of illumos-gate repo (and any other cloned repos) defined anywhere? I definitely wouldn't have found that branch without you mentioning it here.
Interesting enough I also ran into something somewhat related with Dell that they were not able to resolve so they ended up working in a replacement from another vendor.
Nonetheless, it is quite interesting what you've built, but as the end user I'm not quote convinced that it matters. Sure you can claim it reduces attack vectors and such but we'll still see Dells and IBMs in the most restricted and highest security postured sites in the world. Think DoD and such. Core/libreboot with RoT will get me through compliance the same.
The software management plane y'all built is the headlining feature IMHO, not so much what happens behind the scenes that the vast majority of the time will not have a fatal catastrophic upstream effect.
>There are plenty of valid criticisms of Oxide -- but that we don't understand our system simply isn't one of them.
That's not what I said. There's a line in the sand that you must cross when it comes to understanding the true nature of the componentry that you're using. At the end of the day, your AMD CPUs may be lying to you, to all of us, but we just don't know it yet.
> Off by a few orders of magnitude. Dell on-site SLA with pre-purchased spares was about 6 hours.
You're talking about replacement parts. Yes Dell is good about that.
The discussion above is asking them to diagnose and fix a problem with the interaction of various hardware components (all of which come from third parties).
But they _are_ writing the firmware that runs most of them and need to understand those devices at a deep level in order to do that, unlike Dell. Dell slaps together hardware and firmware from other vendors with some high level software of their own on top. They don't do the low level firmware and thus don't understand the low level intricacies of their own systems.
No they're not unless I'm mistaken. They're not writing the firmware that runs on the NVMe drives, nor the NICs (they're not even writing the drivers for some of the NICs), etc.
There's a line in the sand that you must cross when it comes to understanding the true nature of the componentry that you're using. At the end of the day, your AMD CPUs may be lying to you, to all of us, but we just don't know it yet.
I'm not speaking hypothetically. If you hit a "zero-day" bug that Dell has never seen it's going to take time. And somehow every large customer finds bugs that Dell certification didn't.
> And somehow every large customer finds bugs that Dell certification didn't.
It's a law of computer engineering.
In the Apollo 11 decent sequence the Rendezvous Radar experienced a hardware bug[0] not uncovered during simulation. They found it later, but until then, the solution was adding a "turn off Rendezvous Radar" checklist item.
[0] The Rendezvous Radar would stop the CPU, shuttle some data into areas it could be read, and woke the CPU back up to process it. The bug caused it to supuriously do this dance just to tell it "no new data", which then caused other systems to overload.
It's ironic coming from a company who's CTO has harped about containers on bare metal for years. Maybe a large swath only need to deploy VMs, but the future will most definitely involve bare metal for many use cases, and oddly Oxide doesn't support that currently.
See the pattern? Dell only care about the big guys.
Set aside the childish tone ...
> Dell is a single vendor that will diagnose and fix all of your hardware issues.
There are two anecdotes here disagreeing with you, and frankly that's enough to say what you said above isn't true, not universally so. I doubt Odixe is targeting big deployment like yours, but more like theirs. Whether they will succeed is another matter, but they do have a valid sales pitch and the expertise to pull it off.
Sure you can, but then who will diagnose and fix your hardware/OS interaction problems when you have parts from five vendors in the mix?
If you haven't lived through this, the answer is: nobody. Everyone points fingers at the other 4 and ignore your calls.
Back in the day you could buy a fully integrated system (from CPU to hardware to OS) from Sun or SGI or HP and you had a single company to answer all the calls, so it was much better. Today you can't really get this level of integration and support anymore.
(Actually, you probably can from IBM, which is why they're still around. But I have no experience in the IBM universe.)
This is why Oxide is so exciting to me. I hope I can be in a company that becomes a customer at some point.