But what was the checksum? Like the actual, specific value?
The Factorio devs found[1] that some devices do fail to compute checksums, in that they compute the checksum just fine, but they're doing something stupid with some values and so checksums of 0x0000 or 0xFFFF (the two values from the FFF) cause packet loss.
In any protocol that, when the packet repeats, repeats it with even the slightest permutation (different request ID, timestamp, sequence number, etc.), that will be enough to jiggle the checksum to a new value (probably), and then the protocol will keep going with only a minor blip that probably goes unnoticed.
But if the packet is deterministic, only then you hit the problem.
> calculating the UDP checksum is not exactly rocket science.
I've seen things that trivial get messed up. "Just read the standard" is a high bar, sometimes. (Though the above is probably "I dual purposed a u16 without realizing it didn't have any available niches for that…")
> Unlike the TCP checksum, the UDP checksum is optional; the value zero is transmitted in the checksum field of a UDP header to indicate the absence of a checksum. If the transmitter really calculates a UDP checksum of zero, it must transmit the checksum as all 1's (65535). No special action is required at the receiver, since zero and 65535 are equivalent in 1's complement arithmetic.
Using 0x0000 and 0xFFFF as special values via 1's complement creates the error, only for these 2 specific values, when 2's complement logic is used to calculate.
But what was the checksum? Like the actual, specific value?
The Factorio devs found[1] that some devices do fail to compute checksums, in that they compute the checksum just fine, but they're doing something stupid with some values and so checksums of 0x0000 or 0xFFFF (the two values from the FFF) cause packet loss.
In any protocol that, when the packet repeats, repeats it with even the slightest permutation (different request ID, timestamp, sequence number, etc.), that will be enough to jiggle the checksum to a new value (probably), and then the protocol will keep going with only a minor blip that probably goes unnoticed.
But if the packet is deterministic, only then you hit the problem.
> calculating the UDP checksum is not exactly rocket science.
I've seen things that trivial get messed up. "Just read the standard" is a high bar, sometimes. (Though the above is probably "I dual purposed a u16 without realizing it didn't have any available niches for that…")
[1]: https://www.factorio.com/blog/post/fff-176
Breaking down the oddity on 0x0000 and 0xFFFF further, it stems from this special behavior per the RFCs https://www.rfc-editor.org/rfc/rfc1122#page-29:~:text=is%20v...:
> Unlike the TCP checksum, the UDP checksum is optional; the value zero is transmitted in the checksum field of a UDP header to indicate the absence of a checksum. If the transmitter really calculates a UDP checksum of zero, it must transmit the checksum as all 1's (65535). No special action is required at the receiver, since zero and 65535 are equivalent in 1's complement arithmetic.
Using 0x0000 and 0xFFFF as special values via 1's complement creates the error, only for these 2 specific values, when 2's complement logic is used to calculate.
Interesting... I've heard enabling tx/rx offloading is actually beneficial, turns out that's not always the case...
It'd be interesting to see what the wrong checksum it calculates is ...