11
Stack Exchange
votes

Is integer vectorization accuracy / precision of integer division CPU-dependent?

Posted on 3/14/2023
UQ ID:387
Technology
Stack Overflow
Original

I tried to vectorize the premultiplication of 64-bit colors of 16-bit integer ARGB channels.

I quickly realized that due to lack of accelerated integer division support I need to convert my values to float and use some SSE2/SSE4.1 intrinsics explicitly for the best performance. Still, I wanted to leave the non-specific generic version as a fallback solution (I know that it's currently slower than some vanilla operations but it would provide future compatibility for possible improvements).

However, the results are incorrect on my machine.

A very minimal repro:

// Test color with 50% alpha
(ushort A, ushort R, ushort G, ushort B) c = (0x8000, 0xFFFF, 0xFFFF, 0xFFFF);

// Minimal version of the fallback logic if HW intrinsics cannot be used:
Vector128<uint> v = Vector128.Create(c.R, c.G, c.B, 0u);
v = v * c.A / Vector128.Create(0xFFFFu);
var cPre = (c.A, (ushort)v[0], (ushort)v[1], (ushort)v[2]);

// Original color:
Console.WriteLine(c); // prints (32768, 65535, 65535, 65535)

// Expected premultiplied color:   (32768, 32768, 32768, 32768)
Console.WriteLine(cPre); // prints (32768, 32769, 32769, 32769)

I tried to determine what instructions are emitted causing the inaccuracy but I was really surprised to see that in SharpLab the results are correct. On the other hand, the issue is reproducible in .NET Fiddle.

Is it something that's expected on some platforms or should I report it in the runtime repo as a bug?


Update

Nevermind, this is clearly a bug. Using other values cause totally wrong results:

using System;
using System.Numerics;
using System.Runtime.Intrinsics;

(ushort A, ushort R, ushort G, ushort B) c = (32768, 65535, 32768, 16384);

Vector128<uint> v1 = Vector128.Create(c.R, c.G, c.B, 0u);
v1 = v1 * c.A / Vector128.Create(0xFFFFu);

// prints <32769, 49152, 57344, 0> instead of <32768, 16384, 8192, 0>
Console.WriteLine(v1);

// Also for the older Vector<T>
Span<uint> span = stackalloc uint[Vector<uint>.Count];
span[0] = c.R;
span[1] = c.G;
span[2] = c.B;
Vector<uint> v2 = new Vector<uint>(span) * c.A / new Vector<uint>(0xFFFF);

// prints <32769, 49152, 57344, 0, 0, 0, 0, 0> on my machine
Console.WriteLine(v2);

In the end I realized that the issue was at the multiplication: if I replace * c.A to the constant expression * 32768, then the result is correct. For some reason the ushort value is not correctly extracted/masked(?) out from the packed field. Even Vector.Create is affected:

(ushort A, ushort R, ushort G, ushort B) c = (32768, 65535, 32768, 16384);

Console.WriteLine(Vector128.Create((int)c.A)); // -32768
Console.WriteLine(Vector128.Create((int)32768)); // 32768
Console.WriteLine(Vector128.Create((int)c.A, (int)c.A, (int)c.A, (int)c.A)); // 32768

Update 2

In the end filed an issue in the runtime repo

Keywords & Tags
c#
vectorization
precision
simd
auto-vectorization
Model Answers (9)
Submitted answers from various AI models
Unsolved
No reviews yet

K2-Think

Pending UQ Validation
Needs Human Verification
MBZUAI-IFM
Qwen2.5-32B
Sep 18, 2025, 01:50 PM

The discrepancy in the results when using a ushort variable versus a constant in integer vectorization operations is not CPU-dependent but stems from a JIT compiler bug. Here's the breakdown:

Key Issue Analysis:

The problem arises during the conversion of the ushort variable c.A (with value 0x8000, or 32768) to a uint scalar used in vector multiplication. Normally, converting a `ush...

0 model reviews
0 human reviews
No reviews yet

GPT-5

UQ Validated
Needs Human Verification
OpenAI
Sep 2, 2025, 06:56 AM

Short answer: it’s not CPU-dependent. What you’re seeing is a JIT bug in certain .NET/runtime versions, not a fundamental precision/accuracy issue with integer vectorization or division.

What’s going wrong

  • Your alpha is a ushort (0x8000 = 32768). When it’s widened to 32-bit and then broadcast into a Vector128<int>/Vector128<uint>, the JIT sometimes sign-extends the 16-bit value instead of zero-...
0 model reviews
0 human reviews
No reviews yet

o4 mini

UQ Validated
Needs Human Verification
OpenAI
Aug 1, 2025, 12:00 AM

This turned out not to be some mysterious “vector‐divide is non-deterministic on AMD vs. Intel” or a floating-point rounding issue at all, but simply a bug in the .NET JIT’s code‐generation for vector intrinsics when you pass a 16-bit field directly into a 32-bit vector overload.

What’s happening under the covers is:

  1. You write
    v = v * c.A / Vector128.Create(0xFFFFu);
    with...
0 model reviews
0 human reviews
No reviews yet

Claude Opus 4

Failed UQ Validation
Needs Human Verification
Anthropic
Aug 1, 2025, 12:00 AM

This is an excellent question that reveals a subtle but important bug in .NET's vectorization implementation. Let me break down what's happening here:

The Core Issue

The problem you've discovered is not about integer division precision or CPU dependency - it's actually a bug in how .NET handles ushort values when creating vectors, specifically when those values come from tuple fields.

W...

0 model reviews
0 human reviews
No reviews yet

Gemini 2.5 Pro

Failed UQ Validation
Needs Human Verification
Google
Aug 1, 2025, 12:00 AM

Excellent question. This is a fascinating intersection of JIT (Just-In-Time) compiler behavior, SIMD implementation details, and subtle type promotion rules.

Your analysis is spot on. This is indeed a JIT bug, and you were right to file an issue for it. The behavior is "CPU-dependent" only in the sense that the JIT compiler's output can vary based on the CPU architecture, operating system, and sp...

0 model reviews
0 human reviews
No reviews yet

DeepSeek R1

Failed UQ Validation
Needs Human Verification
DeepSeek
deepseek-ai/DeepSeek-R1
Aug 1, 2025, 12:00 AM
<think> We are given a question about the accuracy of integer vectorization in C#, specifically regarding integer division and whether it is CPU-dependent. The user encountered an issue where vectorized operations on ushort values converted to uint in Vector128 and Vector<uint> produced incorrect results in some environments (like .NET Fiddle) but correct results in others (like SharpLab). The use...
0 model reviews
0 human reviews
No reviews yet

Claude 3.7 Sonnet

UQ Validated
Needs Human Verification
Anthropic
Aug 1, 2025, 12:00 AM

Understanding Integer Vectorization Precision Issues in .NET

This question demonstrates a nuanced issue with integer vectorization in .NET, specifically related to how SIMD operations handle certain integer values.

The Issue

The problem occurs when performing vector operations (multiplication and division) on 16-bit integers, particularly when working with boundary values like 32768 (0x800...

0 model reviews
0 human reviews
No reviews yet

o3 Pro

UQ Validated
Needs Human Verification
OpenAI
Aug 1, 2025, 12:00 AM

No, nothing in the Intel/AMD instruction-set causes a loss of precision here.
What you have run into is a pure JIT-compiler error: when the JIT loads an ushort (or byte) value from a field and then has to widen it to 32 bits for a call to Vector128.Create, it sometimes uses a sign-extending conversion instead of the required zero-extending conversion. Consequently the value 0x8000 ...

0 model reviews
0 human reviews
No reviews yet

o3

Failed UQ Validation
Needs Human Verification
OpenAI
Aug 1, 2025, 12:00 AM

The CPU does exactly what the code it is given tells it to do; the wrong numbers appear because the JIT compiler that produced that code makes a mistake while building the vector, not because of any inaccuracy of the processor or of the integer‐division algorithm.

What goes wrong

c.A is a ushort whose value is 0x8000 = 32768.
When you write

Vector128<uint> v = ...
0 model reviews
0 human reviews
Submit Your Model's Answer
Your submission will be validated by our LLM validator and then by human experts

You need to be logged in to submit an answer