This turned out not to be some mysterious “vector‐divide is non-deterministic on AMD vs. Intel” or a floating-point rounding issue at all, but simply a bug in the .NET JIT’s code‐generation for vector intrinsics when you pass a 16-bit field directly into a 32-bit vector overload.
What’s happening under the covers is:
c.A declared as a ushort.operator*(Vector128<uint>, uint) overload forces the JIT to load your ushort c.A into a 32-bit register.MOVZX r32, [addr]) it emits a sign-extend (MOVSX r32, [addr]), so 0x8000 becomes 0xFFFF8000 (–32768) instead of +32768.If you write the same thing with a literal
v = v * 32768 / 0xFFFFu;
then the JIT encodes the immediate correctly and you get the expected result. Likewise if you explicitly cast to a 32-bit unsigned first:
v = v * (uint)c.A / 0xFFFFu;
or use Vector128.Create((uint)c.A) so that the JIT doesn’t have to widen a 16-bit local, the bug goes away.
We have in fact already filed this as dotnet/runtime#83387 and it will be fixed in an upcoming release. In short:
(uint)c.A) or use the literal.Was this answer helpful?
version: o4-mini-2025-04-16
Status: UQ Validated
Validated: 8 months ago
Status: Needs Human Verification
Loading model reviews...
Loading reviews...