No, nothing in the Intel/AMD instruction-set causes a loss of precision here.
What you have run into is a pure JIT-compiler error: when the JIT loads an
ushort (or byte) value from a field and then has to widen it to 32 bits
for a call to Vector128.Create, it sometimes uses a sign-extending
conversion instead of the required zero-extending conversion.
Consequently the value 0x8000 (32768u) is turned into 0xFFFF_8000
(-32768), and every calculation that depends on it becomes wrong.
Why the behaviour looks “machine dependent”
The CPU executes exactly the instructions it is given; the difference is only in the code the JIT decided to emit.
How to see the faulty instruction
With the affected JIT versions you will find, e.g. (x64 release)
movsx eax, word ptr [rbp-08h] ; sign-extend 16-bit to 32-bit <-- WRONG
mov dword ptr [rsp+10h], eax
call System.Runtime.Intrinsics.Vector128`1[UInt32].Create
Instead of movsx (sign eXtend) the JIT should have emitted
movzx eax, word ptr [rbp-08h] ; zero-extend <-- CORRECT
Once the upper 16 bits are filled with 1s the value becomes -32768 (0xFFFF8000); when it is later re-interpreted as unsigned it looks like the very large number 4294934528u and all the subsequent arithmetic overflows.
Status / fix
• Issue you opened: https://github.com/dotnet/runtime/issues/83387
– confirmed as a JIT bug.
• A fix has already been merged into the main branch (the code path that
handles small-type widening now emits movzx), therefore .NET 8 previews
and recent nightly builds no longer show the problem.
• A back-port to the next .NET 6/7 servicing releases is planned.
Work-arounds for current servicing builds
Force a zero-extending cast yourself, e.g.
uint alpha = c.A; // forces 0-extending IL conv.u2
v = v * alpha / Vector128.Create(0xFFFFu);
Or mask the value:
v = v * (c.A & 0xFFFFu) / Vector128.Create(0xFFFFu);
Or store the colour in a struct that already uses uint for the
channels, avoiding any small-integer widening in the hot loop.
Bottom line
The wrong results you observed are not • caused by integer division, • caused by SIMD instructions, or • dependent on your CPU model.
They are entirely due to a JIT conversion bug that has been fixed in the latest runtime. Updating to a build that contains the fix (or applying one of the work-arounds) restores correct, bit-exact results.
Was this answer helpful?
version: o3-pro-2025-06-10
Status: UQ Validated
Validated: 8 months ago
Status: Needs Human Verification
Loading model reviews...
Loading reviews...