DeepSeek R1 Answer - UQ Project

<think> We are given an N x k Gaussian matrix M (entries i.i.d. standard normal) with N >> k, and a diagonal matrix D = diag(λ₁, ..., λₙ). We are interested in the QR decomposition of DM, specifically the distribution of the Q factor (which is N x k with orthonormal columns).

In the case where k = N and D = I, Q is known to be uniformly distributed with respect to the Haar measure on the orthogonal group O(N). However, we are in the case k < N and with a general diagonal matrix D.

Let's first consider the simplest case: k=1. Then M is an N-dimensional random vector with i.i.d. standard normal entries, and DM is the vector (λ₁M₁, λ₂M₂, ..., λₙMₙ). The QR decomposition of a vector is just normalizing it to have unit norm. So Q = DM / ||DM||.

We are to find the distribution of Q. Note that Q is a random vector on the unit sphere in R^N.

The vector DM has independent components: the i-th component is λ_i * M_i, where M_i ~ N(0,1). Therefore, the components of DM are independent normal random variables with mean 0 and variances λ_i².

We are then normalizing this vector to have unit norm. The distribution of a normalized vector of independent normals with different variances is known as the scaled multivariate normal distribution on the sphere.

In the case where all λ_i are equal (say λ_i = λ for all i), then DM is just λ times a vector of i.i.d. standard normals, and then Q = (λM) / ||λM|| = M / ||M||, which is uniformly distributed on the unit sphere (by the spherical symmetry of the standard normal distribution). This is the result from the linked post.

But when the λ_i are different, the distribution is not uniform. We can write:

Q = (X₁, X₂, ..., Xₙ) / ||X||, where X_i ~ N(0, λ_i²) independently.

But note that ||X|| = sqrt(∑_{i=1}^N X_i²) and each X_i has a normal distribution with variance λ_i².

The joint density of X = (X₁, ..., Xₙ) is:

f_X(x) = ∏_{i=1}^N [1/(√(2π) λ_i) exp(-x_i²/(2λ_i²))]

We are interested in the distribution of Q = X / ||X||. This is a direction in R^N. The distribution of the direction of a multivariate normal vector with independent components and different variances is known to be the angular central Gaussian distribution. However, note that the angular central Gaussian distribution is typically defined for a centered multivariate normal with a general covariance matrix (not necessarily diagonal). In our case, the covariance matrix of X is Σ = diag(λ₁², λ₂², ..., λₙ²).

The angular central Gaussian distribution (ACG) is defined on the unit sphere. The density of the ACG distribution with parameter Σ (a positive definite matrix) at a point u on the unit sphere is:

f(u; Σ) = |Σ|^{-1/2} (u^T Σ^{-1} u)^{-N/2} / c

where c is a normalizing constant. However, note that the standard ACG is defined for a multivariate normal vector with covariance matrix Σ, and then normalized. In our case, we have independent components, so Σ is diagonal.

But let us derive the density for our specific case.

We can use the fact that the direction vector Q = X / ||X|| is independent of the radius R = ||X||. This is a standard result for spherically symmetric distributions? Actually, the normal distribution with independent components and different variances is not spherically symmetric. However, the direction and the radius are still independent? Let us check.

The joint density of X is:

f_X(x) = (2π)^{-N/2} (∏{i=1}^N λ_i^{-1}) exp(-∑{i=1}^N x_i²/(2λ_i²))

Now, we change to polar coordinates. However, note that the standard polar coordinates are for the case of spherical symmetry. Here, we have anisotropy.

We can write the density in terms of the direction and the radius. Let r = ||x|| and u = x / r, with u on the unit sphere. Then x = r u.

The Jacobian of the transformation from x to (r, u) is r^{N-1} (with du being the surface measure on the sphere). Then the joint density of (r, u) is:

f_{r,u}(r, u) = f_X(r u) * r^{N-1}

= (2π)^{-N/2} (∏{i=1}^N λ_i^{-1}) exp(-∑{i=1}^N (r u_i)^2/(2λ_i²)) * r^{N-1}

= (2π)^{-N/2} (∏{i=1}^N λ_i^{-1}) exp(-r² ∑{i=1}^N u_i²/(2λ_i²)) * r^{N-1}

Note that ∑_{i=1}^N u_i² = 1 because u is on the unit sphere. So we have:

f_{r,u}(r, u) = (2π)^{-N/2} (∏{i=1}^N λ_i^{-1}) exp(-r² * (∑{i=1}^N u_i²/(2λ_i²))) * r^{N-1}

We can write this as:

f_{r,u}(r, u) = [ (2π)^{-N/2} (∏_{i=1}^N λ_i^{-1}) r^{N-1} exp(-r² * h(u)/2) ]

where h(u) = ∑_{i=1}^N u_i² / λ_i².

To get the marginal density of u, we integrate over r from 0 to ∞:

f_u(u) = ∫0^∞ f{r,u}(r, u) dr

= (2π)^{-N/2} (∏_{i=1}^N λ_i^{-1}) ∫_0^∞ r^{N-1} exp(-r² * h(u)/2) dr

Make a change of variable: let t = r², then dt = 2r dr, so r dr = dt/2. But note that we have r^{N-1} dr = r^{N-2} * (r dr) = (t^{(N-2)/2}) * (dt/2) = (1/2) t^{(N-2)/2} dt.

Alternatively, we can use the gamma function. The integral is:

∫_0^∞ r^{N-1} exp(-a r²) dr, where a = h(u)/2.

We know that:

∫_0^∞ r^{b-1} exp(-a r²) dr = (1/2) a^{-b/2} Γ(b/2)

Set b = N, then:

∫_0^∞ r^{N-1} exp(-a r²) dr = (1/2) a^{-N/2} Γ(N/2)

So in our case, a = h(u)/2, so:

∫_0^∞ r^{N-1} exp(-r² * h(u)/2) dr = (1/2) [h(u)/2]^{-N/2} Γ(N/2) = (1/2) (2^{N/2}) h(u)^{-N/2} Γ(N/2) = 2^{N/2 - 1} Γ(N/2) h(u)^{-N/2}

Therefore:

f_u(u) = (2π)^{-N/2} (∏_{i=1}^N λ_i^{-1}) * 2^{N/2 - 1} Γ(N/2) h(u)^{-N/2}

Simplify the constants:

= (2^{-N/2} π^{-N/2}) (∏_{i=1}^N λ_i^{-1}) * 2^{N/2 - 1} Γ(N/2) h(u)^{-N/2}

= π^{-N/2} (∏_{i=1}^N λ_i^{-1}) * 2^{-1} Γ(N/2) h(u)^{-N/2}

= [Γ(N/2) / (2 π^{N/2})] * (∏_{i=1}^N λ_i^{-1}) * h(u)^{-N/2}

Now, h(u) = ∑{i=1}^N u_i² / λ_i², so h(u)^{-N/2} = [∑{i=1}^N u_i² / λ_i²]^{-N/2}

We can write this as:

f_u(u) = C * (∏{i=1}^N λ_i^{-1}) * [∑{i=1}^N u_i² / λ_i²]^{-N/2}

where C = Γ(N/2) / (2 π^{N/2})

But note that the density must integrate to 1 over the unit sphere. The constant C might be adjusted by the product of λ_i, but actually, we can write it in a more standard form.

Alternatively, we can express the density in terms of the matrix Σ = diag(λ₁², λ₂², ..., λₙ²). Then:

h(u) = u^T Σ^{-1} u

because (Σ^{-1})_{ii} = 1/λ_i², and off-diagonal 0.

So:

f_u(u) = C * (∏_{i=1}^N λ_i^{-1}) * [u^T Σ^{-1} u]^{-N/2}

But note that |Σ| = ∏{i=1}^N λ_i², so |Σ|^{-1/2} = (∏{i=1}^N λ_i²)^{-1/2} = ∏_{i=1}^N λ_i^{-1}. Therefore:

f_u(u) = C * |Σ|^{-1/2} * [u^T Σ^{-1} u]^{-N/2}

This is exactly the form of the angular central Gaussian distribution. The normalizing constant for the ACG distribution is known to be:

f(u; Σ) = |Σ|^{-1/2} (u^T Σ^{-1} u)^{-N/2} / S

where S is the normalizing constant that makes the integral over the sphere equal to 1. The constant S is the surface area of the sphere times a factor, but we have already derived the expression.

In fact, the standard form of the ACG density is:

f(u; Σ) = [Γ(N/2) / (2 π^{N/2})] |Σ|^{-1/2} (u^T Σ^{-1} u)^{-N/2}

But wait, we have exactly that form with C = Γ(N/2)/(2 π^{N/2}). However, we must check if this integrates to 1.

Alternatively, the standard ACG density is often written as:

f(u; Σ) = \frac{ |\Sigma|^{-1/2} }{ (u^T \Sigma^{-1} u)^{N/2} } \cdot \frac{ \Gamma(N/2) }{ 2 \pi^{N/2} }

and it is known that this integrates to 1 over the unit sphere. So our derivation matches.

Therefore, for k=1, the distribution of Q is the angular central Gaussian distribution on the unit sphere with parameter Σ = diag(λ₁², λ₂², ..., λₙ²).

Now, what about the general case k>1? We are looking for the QR decomposition of DM, where D is diagonal and M is N x k Gaussian.

The QR decomposition of DM is DM = QR, where Q is N x k with orthonormal columns and R is k x k upper triangular with positive diagonal entries.

In the case D = I, it is known that Q is distributed according to the Haar measure on the Stiefel manifold V_{k}(R^N) (the set of all N x k matrices with orthonormal columns). This is a uniform distribution.

But when D is not the identity, the distribution changes.

We can think of DM as a linear transformation of M: DM = D M. Since M has i.i.d. standard normal entries, the columns of M are independent and each is distributed as N(0, I_N). Then the columns of DM are independent and the j-th column is distributed as N(0, D D^T) = N(0, diag(λ₁², λ₂², ..., λₙ²)).

So DM is a random matrix whose columns are independent and the j-th column has distribution N(0, Σ) with Σ = diag(λ₁², λ₂², ..., λₙ²).

We are then performing QR decomposition on this matrix.

The distribution of Q from the QR decomposition of a random matrix with independent columns each being multivariate normal with the same covariance matrix Σ is known to be the same as the distribution of the Q factor from the QR decomposition of Σ^{1/2} M, where M is a standard Gaussian matrix (i.i.d. standard normal entries). This is because if we let X = Σ^{1/2} M, then the columns of X are independent N(0, Σ). Then the QR decomposition of X is the same as that of DM in our case (if we set Σ = D^2, since D is diagonal, D^2 = diag(λ₁², ..., λₙ²)).

But note: in our case, DM = D M, and since D is diagonal, we have that the covariance of each column is D * I * D^T = D^2 = diag(λ₁², ..., λₙ²). So indeed, we can write DM = Σ^{1/2} M, where Σ^{1/2} = D (because D is diagonal and nonnegative? Actually, we must be cautious: the λ_i are fixed real scalars. The problem does not specify that they are positive. But if they are negative, then D is not positive semi-definite? However, in the QR decomposition, the sign might be absorbed. But typically, we require the diagonal of R to be positive. So we must assume that the λ_i are nonzero? Actually, if λ_i=0, then the i-th row of DM is zero, which might cause problems. So we assume λ_i ≠ 0 and without loss of generality, we can take λ_i > 0 (because the sign can be absorbed in the Gaussian randomness). So we assume λ_i > 0.

Therefore, we can write DM = D M, and D is diagonal with positive entries. Then the QR decomposition of DM is the same as that of D M.

Now, we can factor D out. However, note that the QR decomposition is not linear. But we can relate the QR decomposition of D M to that of M.

Let M = Q0 R0 be the QR decomposition of M. Then D M = D Q0 R0. But D Q0 is not necessarily orthogonal. So we cannot directly use that.

Alternatively, we can consider the QR decomposition of D M. Since D is invertible (as λ_i>0), we can write:

D M = Q R

We want the distribution of Q.

Note that the columns of D M are independent, each being N(0, D^2). The QR decomposition is unique if we require positive diagonal entries in R.

There is a result that the Q matrix from the QR decomposition of a random matrix with i.i.d. columns from a centered normal distribution with covariance Σ is distributed as the Q from the QR decomposition of Σ^{1/2} M, where M is standard Gaussian. But we already have that.

To find the distribution of Q, we can use the fact that the Stiefel manifold is homogeneous and that the distribution of Q can be characterized by invariance properties. However, with a general Σ, the distribution is not uniform.

In fact, the distribution of Q in the QR decomposition of a random matrix with independent columns each having a normal distribution with covariance Σ is known as the matrix angular central Gaussian distribution. This is a generalization of the vector case.

We can derive the density of Q on the Stiefel manifold. The Stiefel manifold V_{k}(R^N) is the set of all N x k matrices with orthonormal columns. The dimension of this manifold is Nk - k(k+1)/2.

The probability element for Q can be found by considering the transformation from the Gaussian matrix to (Q, R). However, this is complicated.

A standard approach is to note that the QR decomposition is related to the Gram-Schmidt process. The first column of Q is the normalization of the first column of DM. The second column is the normalization of the component of the second column of DM that is orthogonal to the first, and so on.

But because of the different variances, the columns are not identically distributed.

In the case of independent columns with the same covariance Σ, the distribution of Q can be derived.

We recall that the QR decomposition of a matrix X (N x k) can be written as:

X = Q R

where R is upper triangular with positive diagonal entries.

The joint density of the elements of X (if X has a density) can be transformed to the joint density of (Q, R). It is known that the Jacobian of this transformation is proportional to ∏{i=1}^k r{ii}^{N-i} (see, for example, Muirhead, "Aspects of Multivariate Statistical Theory", or other references on multivariate statistics).

Specifically, if X has density f_X(X), then the density of (Q, R) is:

f_{Q,R}(Q, R) = f_X(Q R) * J

where J is the Jacobian of the transformation from X to (Q, R). The Jacobian is given by:

J = ∏{i=1}^k r{ii}^{N-i}

In our case, X = DM, and the columns of X are independent, each having a normal distribution with mean 0 and covariance Σ = D^2 (diagonal). So the density of X is:

f_X(X) = (2π)^{-N k / 2} |Σ|^{-k/2} exp(-(1/2) tr(X^T Σ^{-1} X))

because the columns are independent and identically distributed with covariance Σ, so the joint density is the product of the densities of the columns:

= ∏_{j=1}^k [ (2π)^{-N/2} |Σ|^{-1/2} exp(-(1/2) x_j^T Σ^{-1} x_j) ]

= (2π)^{-N k / 2} |Σ|^{-k/2} exp(-(1/2) ∑_{j=1}^k x_j^T Σ^{-1} x_j)

= (2π)^{-N k / 2} |Σ|^{-k/2} exp(-(1/2) tr(X^T Σ^{-1} X))

Now, we write X = Q R. Then:

f_X(QR) = (2π)^{-N k / 2} |Σ|^{-k/2} exp(-(1/2) tr( (QR)^T Σ^{-1} (QR) ))

= (2π)^{-N k / 2} |Σ|^{-k/2} exp(-(1/2) tr( R^T Q^T Σ^{-1} Q R ))

The density for (Q, R) is then:

f_{Q,R}(Q, R) = f_X(QR) * ∏{i=1}^k r{ii}^{N-i}

= (2π)^{-N k / 2} |Σ|^{-k/2} \exp\left(-\frac{1}{2} \text{tr}(R^T Q^T \Sigma^{-1} Q R)\right) \prod_{i=1}^k r_{ii}^{N-i}

To find the marginal density of Q, we need to integrate over R. Note that R is upper triangular with positive diagonal entries. The space of R is the set of k x k upper triangular matrices with r_{ii} > 0.

The integral over R is:

I(Q) = \int \exp\left(-\frac{1}{2} \text{tr}(R^T A R)\right) \prod_{i=1}^k r_{ii}^{N-i} dR

where A = Q^T Σ^{-1} Q.

Note that A is a k x k symmetric positive definite matrix because Σ is positive definite and Q has full column rank.

The trace is tr(R^T A R). Since R is upper triangular, this expression is a quadratic form in the elements of R.

The integral over R can be performed by iterating over the rows of R. This is a standard integral in multivariate analysis. It is known that:

\int \exp\left(-\frac{1}{2} \text{tr}(R^T A R)\right) \prod_{i=1}^k r_{ii}^{N-i} dR = c \cdot |A|^{-N/2}

for some constant c. This is because the integral is proportional to the normalizing constant of the Wishart distribution. Specifically, the integral over R is similar to the integral that appears in the density of the Cholesky factor of a Wishart matrix.

We can use the change of variables: let S = R^T R. Then S is a k x k positive definite matrix. The Jacobian of the transformation from R to S is known to be proportional to |S|^{(k-1)/2} / ∏{i=1}^k r{ii} (but we have to be careful). Actually, the standard result is that when R is upper triangular with positive diagonal, then the Jacobian of R -> R^T R is 2^k ∏{i=1}^k r{ii}^{k-i+1} (see Muirhead, Theorem 2.1.9). But we have dS = 2^k [∏{i=1}^k r{ii}^{k-i+1}] dR.

But in our case, we have an extra factor of ∏{i=1}^k r{ii}^{N-i}. So:

dR = dS / [ 2^k ∏{i=1}^k r{ii}^{k-i+1} ]

Then the integral becomes:

I(Q) = \int \exp\left(-\frac{1}{2} \text{tr}(A S)\right) \prod_{i=1}^k r_{ii}^{N-i} \cdot \frac{dS}{2^k \prod_{i=1}^k r_{ii}^{k-i+1}}

But note that S = R^T R, so |S| = (∏{i=1}^k r{ii})^2, and also r_{ii} = \sqrt{s_{ii}} (in terms of the diagonal elements of S). However, the product ∏{i=1}^k r{ii}^{N-i} / ∏{i=1}^k r{ii}^{k-i+1} = ∏{i=1}^k r{ii}^{N - i - (k - i + 1)} = ∏{i=1}^k r{ii}^{N - k - 1}

But since r_{ii} = \sqrt{s_{ii}}, we have ∏{i=1}^k r{ii}^{N-k-1} = ∏{i=1}^k s{ii}^{(N-k-1)/2}

Therefore:

I(Q) = \int \exp\left(-\frac{1}{2} \text{tr}(A S)\right) |S|^{(N-k-1)/2} \cdot \frac{1}{2^k} \prod_{i=1}^k s_{ii}^{-(N-k-1)/2} \cdot s_{ii}^{(N-k-1)/2} dS

Wait, that seems to cancel. Actually, we have:

∏{i=1}^k r{ii}^{N-k-1} = [∏{i=1}^k r{ii}]^{N-k-1} = |S|^{(N-k-1)/2}

because |S| = (∏{i=1}^k r{ii})^2.

So:

I(Q) = \int \exp\left(-\frac{1}{2} \text{tr}(A S)\right) |S|^{(N-k-1)/2} \cdot \frac{1}{2^k} dS

But note that the integral is over the space of positive definite matrices, and the density of the Wishart distribution is proportional to |S|^{(n-k-1)/2} \exp(-(1/2) tr(Σ^{-1} S)) for n degrees of freedom. Specifically, the Wishart density is:

c \cdot |S|^{(n-k-1)/2} \exp(-\frac{1}{2} \text{tr}(\Sigma^{-1} S))

and it integrates to 1. Here, we have:

I(Q) = \frac{1}{2^k} \int \exp\left(-\frac{1}{2} \text{tr}(A S)\right) |S|^{(N-k-1)/2} dS

This is proportional to |A|^{-N/2} because the integral is the normalizing constant of the Wishart distribution with n = N degrees of freedom and scale matrix A^{-1}. Specifically, the Wishart distribution W_k(n, \Sigma) has density proportional to |S|^{(n-k-1)/2} \exp(-\frac{1}{2} \text{tr}(\Sigma^{-1} S)) and the normalizing constant is such that:

\int |S|^{(n-k-1)/2} \exp(-\frac{1}{2} \text{tr}(\Sigma^{-1} S)) dS = c \cdot |\Sigma|^{n/2}

where c is a constant (depending on n and k). Therefore, in our case:

I(Q) = \frac{1}{2^k} \cdot c \cdot |A|^{N/2}

But actually, we have:

\int \exp\left(-\frac{1}{2} \text{tr}(A S)\right) |S|^{(N-k-1)/2} dS = \int \exp\left(-\frac{1}{2} \text{tr}(A S)\right) |S|^{(N-k-1)/2} dS = \text{const} \cdot |A|^{-N/2}

because if we let \Sigma^{-1} = A, then the integral is proportional to |A|^{-N/2} (since n = N).

The normalizing constant for the Wishart distribution is:

\int |S|^{(n-k-1)/2} \exp(-\frac{1}{2} \text{tr}(\Sigma^{-1} S)) dS = 2^{n k / 2} \Gamma_k(n/2) |\Sigma|^{n/2}

where \Gamma_k is the multivariate gamma function. Therefore, we have:

\int \exp\left(-\frac{1}{2} \text{tr}(A S)\right) |S|^{(N-k-1)/2} dS = 2^{N k / 2} \Gamma_k(N/2) |A^{-1}|^{N/2} = 2^{N k / 2} \Gamma_k(N/2) |A|^{-N/2}

Therefore:

I(Q) = \frac{1}{2^k} \cdot 2^{N k / 2} \Gamma_k(N/2) |A|^{-N/2} = 2^{N k / 2 - k} \Gamma_k(N/2) |A|^{-N/2}

Now, |A| = |Q^T \Sigma^{-1} Q|

Therefore, the marginal density of Q is:

f_Q(Q) = \int f_{Q,R}(Q, R) dR = (2\pi)^{-N k / 2} |\Sigma|^{-k/2} I(Q)

= (2\pi)^{-N k / 2} |\Sigma|^{-k/2} \cdot 2^{N k / 2 - k} \Gamma_k(N/2) |Q^T \Sigma^{-1} Q|^{-N/2}

Simplify the powers of 2:

= (2\pi)^{-N k / 2} 2^{N k / 2 - k} \Gamma_k(N/2) |\Sigma|^{-k/2} |Q^T \Sigma^{-1} Q|^{-N/2}

= 2^{-k} (2\pi)^{-N k / 2} 2^{N k / 2} \Gamma_k(N/2) |\Sigma|^{-k/2} |Q^T \Sigma^{-1} Q|^{-N/2}

= 2^{-k} \pi^{-N k / 2} \Gamma_k(N/2) |\Sigma|^{-k/2} |Q^T \Sigma^{-1} Q|^{-N/2}

We can write this as:

f_Q(Q) = C \cdot |\Sigma|^{-k/2} |Q^T \Sigma^{-1} Q|^{-N/2}

where C = 2^{-k} \pi^{-N k / 2} \Gamma_k(N/2)

This is the density with respect to the Haar measure on the Stiefel manifold. The Haar measure is the invariant measure, and the density is proportional to |Q^T \Sigma^{-1} Q|^{-N/2}.

In the case k=1, we had:

f_u(u) = [Γ(N/2) / (2 π^{N/2})] |Σ|^{-1/2} (u^T Σ^{-1} u)^{-N/2}

Note that for k=1, |Σ|^{-k/2} = |Σ|^{-1/2}, and |Q^T \Sigma^{-1} Q| = u^T \Sigma^{-1} u (a scalar). Also, the constant C for k=1:

Γ_k(a) for k=1 is Γ(a), so:

C = 2^{-1} \pi^{-N/2} \Gamma(N/2)

Then:

f_Q(Q) = 2^{-1} \pi^{-N/2} \Gamma(N/2) |\Sigma|^{-1/2} (u^T \Sigma^{-1} u)^{-N/2}

which matches the earlier expression (since we had Γ(N/2)/(2 π^{N/2}) |Σ|^{-1/2} (u^T Σ^{-1} u)^{-N/2}).

Therefore, the general result is that the density of Q (with respect to the Haar measure on V_{k}(R^N)) is proportional to |Q^T \Sigma^{-1} Q|^{-N/2}, where \Sigma = D^2.

We can write it as:

f_Q(Q) \propto |Q^T \Sigma^{-1} Q|^{-N/2}

This distribution is known as the matrix angular central Gaussian distribution. It is a generalization of the vector case.

In summary, for the QR decomposition of DM, where M is N x k Gaussian and D is diagonal with positive entries, the Q factor has the matrix angular central Gaussian distribution with parameter \Sigma = D^2.

For the case of D^2 M, note that D^2 M = D (D M). So if we let D' = D^2, then we are in the same situation with \Sigma = (D')^2 = D^4. So the parameter becomes \Sigma = D^4.

Therefore, the distribution of Q for D^2 M is the matrix angular central Gaussian with parameter \Sigma = D^4.

To conclude:

For k=1, the distribution of Q is the angular central Gaussian distribution on the sphere with parameter \Sigma = D^2.
For general k, the distribution of Q is the matrix angular central Gaussian distribution on the Stiefel manifold with parameter \Sigma = D^2.

We have provided the densities in both cases.

Note: The matrix angular central Gaussian distribution is also known in the literature. For example, it is discussed in:

Chikuse, Y. (2003). "Statistics on Special Manifolds". Springer.
Tyler, D. E. (1987). "Statistical analysis for the angular central Gaussian distribution on the sphere". Biometrika.

But the above derivation is self-contained. </think> The QR decomposition of a matrix (DM), where (M) is an (N \times k) Gaussian matrix (entries i.i.d. standard normal) with (N \gg k), and (D = \text{diag}(\lambda_1, \ldots, \lambda_N)) is a diagonal matrix with (\lambda_i > 0), yields an (N \times k) matrix (Q) with orthonormal columns. The distribution of (Q) depends on the scaling matrix (D) and the dimensions (N) and (k). Below, I provide a comprehensive analysis for both the general case ((k \geq 1)) and the specific case (k = 1), including derivations and references.

Key Insight

The matrix (DM) has columns that are independent and identically distributed (i.i.d.) as (\mathcal{N}(0, \Sigma)), where (\Sigma = D^2 = \text{diag}(\lambda_1^2, \ldots, \lambda_N^2)). The QR decomposition (DM = QR) (with (R) upper triangular with positive diagonal) produces (Q), whose distribution is the matrix angular central Gaussian (MACG) distribution on the Stiefel manifold (V_{k}(\mathbb{R}^N)) (the space of (N \times k) matrices with orthonormal columns). This distribution generalizes the vector case ((k = 1)) and depends on (\Sigma).

1. Case (k = 1) (Vector Case)

When (k = 1), (M) is an (N \times 1) vector with i.i.d. (\mathcal{N}(0,1)) entries, and (DM) is the vector: [ DM = (\lambda_1 M_1, \ldots, \lambda_N M_N)^T. ] The QR decomposition reduces to normalization: (Q = \frac{DM}{|DM|}), where (|DM| = \sqrt{(DM)^T (DM)}). Thus, (Q) is a random vector on the unit sphere (\mathbb{S}^{N-1}).

Distribution of (Q)

The components of (DM) are independent, with ((DM)_i \sim \mathcal{N}(0, \lambda_i^2)).
The direction (Q = \frac{DM}{|DM|}) follows the angular central Gaussian (ACG) distribution with parameter (\Sigma = D^2 = \text{diag}(\lambda_1^2, \ldots, \lambda_N^2)).

The probability density function (p.d.f.) of (Q) with respect to the uniform measure on (\mathbb{S}^{N-1}) is: [ f_Q(u) = \frac{\Gamma\left(\frac{N}{2}\right)}{2 \pi^{N/2}} |\Sigma|^{-1/2} (u^T \Sigma^{-1} u)^{-N/2}, \quad u \in \mathbb{S}^{N-1}, ] where:

(|\Sigma| = \prod_{i=1}^N \lambda_i^2) is the determinant of (\Sigma),
(\Sigma^{-1} = \text{diag}(\lambda_1^{-2}, \ldots, \lambda_N^{-2})),
(\Gamma(\cdot)) is the gamma function.

Derivation

The joint density of (X = DM) is: [ f_X(x) = (2\pi)^{-N/2} |\Sigma|^{-1/2} \exp\left(-\frac{1}{2} x^T \Sigma^{-1} x\right). ] Transform to polar coordinates: (x = r u), where (r = |x| > 0) and (u \in \mathbb{S}^{N-1}). The Jacobian is (r^{N-1} dr du), leading to: [ f_{r,u}(r, u) = (2\pi)^{-N/2} |\Sigma|^{-1/2} r^{N-1} \exp\left(-\frac{r^2}{2} u^T \Sigma^{-1} u\right). ] Integrate over (r > 0) to get the marginal density of (u): [ f_u(u) = \int_0^\infty f_{r,u}(r, u) dr = \frac{\Gamma\left(\frac{N}{2}\right)}{2 \pi^{N/2}} |\Sigma|^{-1/2} (u^T \Sigma^{-1} u)^{-N/2}. ] This matches the ACG density. When (\lambda_1 = \cdots = \lambda_N), (\Sigma = \lambda^2 I_N), and (f_Q(u)) simplifies to the uniform distribution on (\mathbb{S}^{N-1}).

2. General Case (k \geq 1)

For (k \geq 1), (DM) is an (N \times k) matrix with i.i.d. columns (\sim \mathcal{N}(0, \Sigma)), where (\Sigma = D^2). The QR decomposition (DM = QR) yields (Q \in V_k(\mathbb{R}^N)). The distribution of (Q) is the matrix angular central Gaussian (MACG) distribution with parameter (\Sigma).

Distribution of (Q)

The p.d.f. of (Q) with respect to the Haar measure (uniform measure) on (V_k(\mathbb{R}^N)) is: [ f_Q(Q) = c \cdot |\Sigma|^{-k/2} |Q^T \Sigma^{-1} Q|^{-N/2}, \quad Q \in V_k(\mathbb{R}^N), ] where:

(c = 2^{-k} \pi^{-Nk/2} \Gamma_k\left(\frac{N}{2}\right)) is a normalizing constant,
(\Gamma_k(a) = \pi^{k(k-1)/4} \prod_{i=1}^k \Gamma\left(a - \frac{i-1}{2}\right)) is the multivariate gamma function,
(| \cdot |) denotes the determinant.

Derivation

The joint density of (X = DM) is: [ f_X(X) = (2\pi)^{-Nk/2} |\Sigma|^{-k/2} \exp\left(-\frac{1}{2} \operatorname{tr}(X^T \Sigma^{-1} X)\right). ] Using the QR decomposition (X = QR), the Jacobian for the transformation (X \mapsto (Q, R)) is (\prod_{i=1}^k r_{ii}^{N-i}), where (r_{ii} > 0) are the diagonal entries of (R). The joint density of ((Q, R)) is: [ f_{Q,R}(Q, R) = f_X(QR) \prod_{i=1}^k r_{ii}^{N-i}. ] Integrate over (R) (upper triangular with (r_{ii} > 0)) to marginalize to (Q). After substitution and using properties of the Wishart distribution, the integral simplifies to: [ f_Q(Q) \propto |\Sigma|^{-k/2} |Q^T \Sigma^{-1} Q|^{-N/2}. ] The normalizing constant (c) ensures (\int_{V_k(\mathbb{R}^N)} f_Q(Q) dQ = 1). For (k = 1), this reduces to the ACG density.

Properties

Invariance: The MACG is invariant under orthogonal transformations: if (Q \sim \text{MACG}(\Sigma)), then (\Gamma Q \sim \text{MACG}(\Gamma \Sigma \Gamma^T)) for (\Gamma \in O(N)).
Special Cases:
- If (\Sigma = I_N), (f_Q(Q)) is constant, so (Q) is uniformly distributed on (V_k(\mathbb{R}^N)) (Haar measure).
- If (k = N), (V_N(\mathbb{R}^N) = O(N)), and (Q) is Haar-distributed when (\Sigma = I_N).

3. Case (D^2 M) and Other Scalings

For (D^2 M), the scaled matrix is (D^2 M = D' M) with (D' = D^2). Thus:

(\Sigma' = (D')^2 = D^4 = \text{diag}(\lambda_1^4, \ldots, \lambda_N^4)).
The (Q) from the QR decomposition of (D^2 M) follows (\text{MACG}(\Sigma')).

Similarly, for (D^p M), use (\Sigma = D^{2p}).

Summary

For (k = 1): (Q = \frac{DM}{|DM|}) follows the ACG distribution with parameter (\Sigma = D^2).
For (k \geq 1): (Q) from (DM = QR) follows the MACG distribution with parameter (\Sigma = D^2).
Density: The p.d.f. involves (|\Sigma|^{-k/2} |Q^T \Sigma^{-1} Q|^{-N/2}), normalized for the Stiefel manifold.
Generalization: For (D^p M), use (\Sigma = D^{2p}).

This result extends the Haar measure case ((\Sigma = I_N)) to arbitrary diagonal scalings. For further details, see:

Chikuse, Y. (2003). Statistics on Special Manifolds. Springer.
Tyler, D. E. (1987). Statistical analysis for the angular central Gaussian distribution on the sphere. Biometrika, 74(3), 579–589.
Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley.

: Assessing Language Models on Unsolved Questions

UQ: Assessing Language Models on Unsolved Questions

Model Answer

Key Insight

1. Case (k = 1) (Vector Case)

Distribution of (Q)

Derivation

2. General Case (k \geq 1)

Distribution of (Q)

Derivation

Properties

3. Case (D^2 M) and Other Scalings

Summary

Quick Feedback

Additional Information:

UQ Validation

Human Verification