o3 Answer - UQ Project

Is the Wilcoxon two-sample test maximally powered to detect proportional odds alternatives?

Science

•stats•9 months ago

o3

OpenAI

8/1/2025, 12:00:00 AM

UQ Validated

Failed Human Verification

Request Edit

No reviews yet

Model Answer

Short answer
Yes—if we formulate the null and the alternative inside the semiparametric proportional-odds (PO) model the Wilcoxon–Mann–Whitney (WMW) statistic is the efficient (hence, asymptotically most powerful) test for a two–sample comparison. No other regular test—rank-based or not—can have larger first-order (Pitman) power against contiguous PO alternatives. Outside that model, however, the WMW test is not uniformly most powerful; as soon as the PO assumption is violated another test can be more powerful.

Why the Wilcoxon statistic appears

Proportional-odds model
For a continuous response Y and a 0/1 group indicator Z
logit{P(Y ≤ y | Z=z)} = α(y) + βz, (1)
with an unrestricted baseline function α(·) and scalar parameter β.
Efficient score for β
Let F₀(y)=P(Y≤y |Z=0). At β=0 the efficient influence function (the derivative of the log-likelihood after orthogonal projection on the nuisance tangent space) can be written, after some algebra (Tsiatis, 1981; Anderson, 1984), as

φ(y,z,F₀) = (z–π) ∫_{-∞}^{y} dF₀(u), π=P(Z=1). (2)

Replacing the unknown integral by the empirical distribution of the pooled sample and summing over the observations gives

  S = Σ_{i} (Z_i–π) R_i                     (3)

where R_i is the mid-rank of Y_i in the combined sample. After centring and scaling (3) is exactly the Wilcoxon rank–sum statistic.

Semiparametric efficiency
The statistic in (3) is the efficient score test for H₀:β=0 in model (1).
• It attains the semiparametric information bound,
• Its non-centrality parameter under local alternatives β_n = n^{-1/2}δ is the largest possible,
• Therefore no other regular test can have higher Pitman power in the PO model.
(van der Vaart, 1998, Chap. 25)

Connection with the “logistic-location-shift’’ statement

If Y|Z=0 has a logistic c.d.f. F(x)=1/[1+e^{-(x−μ)}], a simple location shift yields

 F₁(x)=1/[1+e^{-(x−μ−Δ)}],

and

 logit F₁(x)= (x−μ−Δ) = logit F₀(x) −Δ,

so the two c.d.f.’s are in proportional odds with constant β=-Δ.
Thus the well-known fact that the WMW test is optimal for detecting logistic location shifts is a special case of the general PO result; the logistic distribution is merely a situation in which the (local) PO alternative corresponds to a location shift.

How far does the optimality go?

• Local / first-order only. The efficiency statement is asymptotic and local (β→0). For fixed, non-small β another statistic might have higher power even when (1) is true.

• Model dependent. If model (1) is wrong (e.g. hazards are proportional instead of odds, or the two c.d.f.’s cross) there is in general no uniformly most powerful rank test; a log-rank score, a normal-score test, etc., can dominate WMW for some data-generating mechanisms.

• Among all tests, not just rank tests. Because (3) is the efficient score it cannot be improved by using the actual data values—any such test would coincide with the efficient likelihood-score in the semiparametric model.

So the most general correct statement is

“In the two-sample problem, under the semiparametric proportional-odds model (1) with an unspecified baseline distribution, the Wilcoxon rank-sum test is the efficient score test for the parameter β and is therefore asymptotically the most powerful test against proportional-odds alternatives. It is not guaranteed to be optimal outside that model.”

References

Andersen, P. K. (1984) Score tests in generalized linear models. J. R. Statist. Soc. B 46, 194–202.
Tsiatis, A. A. (1981) A large-sample study of hypotheses testing for competing-risks models. JASA 76, 72–83.
Lehmann, E. L. & D’Abrera, H. J. M. (2006) Nonparametrics: Statistical Methods Based on Ranks, 2nd ed., §9.4.
van der Vaart, A. W. (1998) Asymptotic Statistics, §25.

Quick Feedback

Was this answer helpful?

Additional Information:

version: o3-2025-04-16

UQ Validation

Status: UQ Validated

Validated: 9 months ago

Human Verification

Status: Failed Human Verification

Verified: 9 months ago

UQ Validation

Validation from UQ Validators

Model Reviews (0)

Reviews and evaluations from AI model experts

Loading model reviews...

Human Reviews (0)

Expert reviews and evaluations from human reviewers

Loading reviews...

: Assessing Language Models on Unsolved Questions

UQ: Assessing Language Models on Unsolved Questions

Model Answer

Quick Feedback

Additional Information:

UQ Validation

Human Verification