This system sounds like one of the many pieces of science equipment whose costs are >98% in one-off R&D engineering & mission ops, and <2% in marginal cost of construction/launch.
Imagine a hundred of these exploring Mars semi-autonomously, maybe with LoRa mesh networking, for not a whole lot more money than it cost to send one.
As TFA says, they are running the algorithm multiple times and they check that the results match, to guard against transient errors caused by radiation.
The permanent errors caused by radiation must be identified by periodic self tests. When the permanent damage is in a redundant structure, e.g. as mentioned in TFA when they find some memory bits that are permanently damaged, they must avoid using what is damaged.
Eventually radiation will destroy something that is essential, but until then the Snapdragon CPU should be usable.
Flying a helicopter on Mars was inspiring and useful for scouting, etc. But maybe the best thing coming out of it is undeniable proof that off-the-shelf hardware without radiation hardening is perfectly viable on Mars if you can just reboot it fast enough
Off-the-shelf hardware is usable, but instead of smartphone CPUs one must use the so-called AE (automotive-enhanced) variants of the same ARM cores that have been used in smartphones.
The automotive variants allow the use of multiple redundant cores, which check each other for errors. This would allow a much better performance than NASA gets today from a Snapdragon, due to being forced to run multiple times each computation, then to verify that the same results have obtained.
There are off-the-shelf redundant CPUs of this kind, designed for use in cars and other vehicles, i.e. for a goal much closer of what NASA needs than smartphone CPUs.
The design of the electronics for the Mars helicopter was a very low-effort project, because too many people were skeptical about its chances of success. In other circumstances, it could have been done much better.
It looks like the FPGA that monitors/controls the redundant/lockstep CPUs might be radiation tolerant. From [0]:
"..the critical FPGA which is always on for the duration of the mission, the radiation tolerant ProASIC3 is chosen with the military temperature grade (-55 C to 125 C) and -1 speed grade to mitigate the degradation in the propagation delay caused by the total dose radiation. The single-event upset (SEU) is mitigated with triple module redundancy (TMR) in the FPGA design.
...
The FPGA device is a military-grade version of MicroSemi’s ProASIC3L, which uses the same silicon as the radiation-tolerant device from the same family."[0]
The specs from [1] say there is also a specific radiation-tolerant variant.
So it looks like the CPUs themselves have dual lock-stepped cores, and the CPU checks for errors each cycle. If there's an error it flags the FPGA, which switches to the other CPU.
This system sounds like one of the many pieces of science equipment whose costs are >98% in one-off R&D engineering & mission ops, and <2% in marginal cost of construction/launch.
Imagine a hundred of these exploring Mars semi-autonomously, maybe with LoRa mesh networking, for not a whole lot more money than it cost to send one.
So 100% of the Snapdragons on Mars are no longer sitting idle and are tasked doing useful work. Why can't we do the same for Earth?
You got a helicopter that needs communicating to? What would you do with them? I hope it's not mining coinz.
This stuff - https://www.xda-developers.com/samsung-promised-make-old-pho...
Those CPU's are not radiation hardened.
As TFA says, they are running the algorithm multiple times and they check that the results match, to guard against transient errors caused by radiation.
The permanent errors caused by radiation must be identified by periodic self tests. When the permanent damage is in a redundant structure, e.g. as mentioned in TFA when they find some memory bits that are permanently damaged, they must avoid using what is damaged.
Eventually radiation will destroy something that is essential, but until then the Snapdragon CPU should be usable.
Yeah, that's kind of awesome, isn't it?
Flying a helicopter on Mars was inspiring and useful for scouting, etc. But maybe the best thing coming out of it is undeniable proof that off-the-shelf hardware without radiation hardening is perfectly viable on Mars if you can just reboot it fast enough
Off-the-shelf hardware is usable, but instead of smartphone CPUs one must use the so-called AE (automotive-enhanced) variants of the same ARM cores that have been used in smartphones.
The automotive variants allow the use of multiple redundant cores, which check each other for errors. This would allow a much better performance than NASA gets today from a Snapdragon, due to being forced to run multiple times each computation, then to verify that the same results have obtained.
There are off-the-shelf redundant CPUs of this kind, designed for use in cars and other vehicles, i.e. for a goal much closer of what NASA needs than smartphone CPUs.
The design of the electronics for the Mars helicopter was a very low-effort project, because too many people were skeptical about its chances of success. In other circumstances, it could have been done much better.
It looks like the FPGA that monitors/controls the redundant/lockstep CPUs might be radiation tolerant. From [0]:
"..the critical FPGA which is always on for the duration of the mission, the radiation tolerant ProASIC3 is chosen with the military temperature grade (-55 C to 125 C) and -1 speed grade to mitigate the degradation in the propagation delay caused by the total dose radiation. The single-event upset (SEU) is mitigated with triple module redundancy (TMR) in the FPGA design.
...
The FPGA device is a military-grade version of MicroSemi’s ProASIC3L, which uses the same silicon as the radiation-tolerant device from the same family."[0]
The specs from [1] say there is also a specific radiation-tolerant variant.
So it looks like the CPUs themselves have dual lock-stepped cores, and the CPU checks for errors each cycle. If there's an error it flags the FPGA, which switches to the other CPU.
[0] https://rotorcraft.arc.nasa.gov/Publications/files/Balaram_A...
[1] https://ww1.microchip.com/downloads/aemDocuments/documents/F...