I'm a big fan of Dissimilar Redundancies (but didn't know that was the term until today) for building system software.
Build for various Linux distros, and some of the BSDs. You'll encounter weird compile errors or edge cases that will pop up. Often times I've found that these will expose undefined behaviour or incorrect assumptions that you wouldn't notice if you were building for a single platform.
> Orion utilizes two Vehicle Management Computers, each containing two Flight Control Modules, for a total of four FCMs. But the redundancy goes even deeper: each FCM consists of a self-checking pair of processors.
Who sits down and determines that 8 is the correct number? Why not 4? Or 2? Or 16 or 32?
They probably set an acceptable total loss rate for the mission and worked backwards to determine how many replicas of each system they need to achieve that while minimizing total cost/weight.
So the answer is "some engineers sat down after talking to management".
Not just CPUs, they run a whole different (but also simpler) fallback program in case the main computers fail. I think they were more worried about programming errors but this should avoid all shared failures between the main computers (be it programming or hardware).
I'm a big fan of Dissimilar Redundancies (but didn't know that was the term until today) for building system software.
Build for various Linux distros, and some of the BSDs. You'll encounter weird compile errors or edge cases that will pop up. Often times I've found that these will expose undefined behaviour or incorrect assumptions that you wouldn't notice if you were building for a single platform.
What I would like to see is the fault data. Also a graph of the # of in sync FMCs over time and how well did it correlate with predictions.
I other words, how over engineered is it.
> Orion utilizes two Vehicle Management Computers, each containing two Flight Control Modules, for a total of four FCMs. But the redundancy goes even deeper: each FCM consists of a self-checking pair of processors.
Who sits down and determines that 8 is the correct number? Why not 4? Or 2? Or 16 or 32?
They probably set an acceptable total loss rate for the mission and worked backwards to determine how many replicas of each system they need to achieve that while minimizing total cost/weight.
So the answer is "some engineers sat down after talking to management".
This is correct.
Given a list of estimates of failure probabilities, finding the right mix of redundancy becomes a very tractable problem, maybe even freshman-level.
Getting the probabilities could be very difficult though, especially for issues that never occurred before.
That is what you hire an army of engineers for.
Interesting. In safety components we are using Lockstep Microcontrollers which are doing something similar in a much smaller scale.
https://en.wikipedia.org/wiki/Lockstep_(computing)
Example: https://www.st.com/resource/en/datasheet/spc574k72e5.pdf
Lockstep processors were used here, as well.
> each FCM consists of a self-checking pair of processors.
For the Airbus they used different CPUs because CPUs have bugs too...
Not just CPUs, they run a whole different (but also simpler) fallback program in case the main computers fail. I think they were more worried about programming errors but this should avoid all shared failures between the main computers (be it programming or hardware).