Short answer
It is exactly 0. Because E[Xi^d]=0, we have E[Sn^d]=(1/√n)∑i E[Xi^d]=0. You also assumed Y^d has mean 0. Hence |E[Sn^d]−E[Y^d]|=0 for all n,d.
As written, there is no general useful upper bound. The two conditional expectations are random variables defined on different probability spaces; if you interpret them as conditional expectation functions mS(x)=E[Sn(i)|Sn(1:i−1)=x] and mY(x)=E[Y(i)|Y(1:i−1)=x], then mY(x) is linear in x (Gaussian regression), but mS(x) can be highly nonlinear and unbounded, even though Sn^d converges to Y^d in the sense controlled by Bentkus’ inequality. Consequently, there is no uniform pointwise bound on |mS(x)−mY(x)| that follows from that CLT.
Details and examples
Item 1: E[Sn^d]=0 and E[Y^d]=0, so their difference is identically 0. Your attempted bound via densities and total variation is unnecessary, and in fact the last step is invalid: Bentkus’ inequality controls the supremum over convex sets, not total variation, so it does not bound ∫|pS−pY|.
Item 2: why no pointwise bound follows from Bentkus Bentkus’ bound for convex sets is a weak (setwise) distance. It does not control pointwise differences of conditional expectation functions. One can make mS(x) arbitrarily nonlinear without affecting the third moment β or the covariance Σ, yet the convex-set distance to the matching Gaussian can still be O(β/√n).
A concrete illustration (d=2) makes this clear and also shows what can be bounded in a weak sense:
Let Zi be i.i.d. N(0,1) and define Xi=(Zi, Zi^2−1). Then E[Xi]=0 and Cov(Xi)=diag(1,2). Let Y∼N(0,diag(1,2)) so Y1 and Y2 are independent N(0,1) and N(0,2), respectively.
For Sn built from Xi,
One can compute the conditional expectation exactly (using the Gaussian quadratic-form identity ∑Zi^2 = (∑Zi)^2/n + independent χ^2n−1): E[Sn,2 | Sn,1] = (1/√n)(Sn,1^2 − 1).
For the Gaussian Y, E[Y2 | Y1] = 0 (independence).
Thus the pointwise difference |E[Sn,2 | Sn,1] − E[Y2 | Y1]| = (1/√n)|Sn,1^2−1| is unbounded as a function of the conditioning value (because Sn,1 can be arbitrarily large). So there is no finite uniform bound supx |mS(x)−mY(x)|.
Nevertheless, under the distribution of the conditioning variable, the discrepancy is small: E{ [E(Sn,2 | Sn,1)]^2 } = (1/n) Var(Sn,1^2 − 1) = 2/n, so the root-mean-square size is √(2/n). For general Zi with mean 0, var 1 and finite fourth moment μ4, a similar calculation shows that the L2 size of E[Sn,2 | Sn,1] is of order 1/√n, with a constant depending on μ4.
What you can bound (and what you cannot)
You cannot use Bentkus’ convex-set bound to control total variation, hence you cannot justify bounds of the form ∫|pS−pY| or bounds derived from it (your last inequality is not valid).
If you want bounds that compare conditional expectations “in a weak (integrated) sense,” you can use multivariate Berry–Esseen/Stein bounds for smooth test functions. For example, for any sufficiently smooth g: R^{i−1}→R with bounded derivatives up to order 3, one can bound | E[ Sn(i) g(Sn(1:i−1)) ] − E[ Y(i) g(Y(1:i−1)) ] | by C d^{1/4} β / √n times a norm of g (this is a standard smooth-metric multivariate Berry–Esseen/Stein bound; see e.g. Reinert & Röllin 2009; Chen & Shao 2004; Raic 2019). Using the Gaussian regression identity E[Y(i) | Y(1:i−1)] = a' Y(1:i−1) with a depending only on Σ, this yields integrated bounds for the difference between the regression functions of Sn and Y against test functions g.
Pointwise bounds on x ↦ E[Sn(i) | Sn(1:i−1)=x] − E[Y(i) | Y(1:i−1)=x] are impossible in general without strong structural assumptions (e.g., elliptical/log-concave laws, or explicit smoothness/decay of densities) because the conditional expectation for Sn can grow without bound in x even while Sn converges to Y in distribution.
Dimension growth
Takeaways
Item 1: the difference of means is identically 0.
Item 2: as a pointwise quantity, there is no uniform bound in general. If you settle for integrated (weak) comparisons against smooth test functions, you get O(d^{1/4} β / √n)-type bounds. If you measure mean-square size under the law of the conditioning variables (as in the 2D example), the discrepancy is typically O(1/√n) with a constant depending on low-order moments, but no uniform bound over all conditioning values exists without extra assumptions.
Was this answer helpful?
version: gpt-5-2025-08-07
Status: UQ Validated
Validated: 7 months ago
Status: Needs Human Verification
Loading model reviews...
Loading reviews...