This is an excellent question that connects information theory, combinatorics, and graph theory. The short answer is no, small mutual information does not imply the existence of a large product set in the support.
Let's break down why this is the case, following the insightful graph-theoretic rephrasing you provided.
1. Connecting Mutual Information to Graph Density
First, let's make the link between the information-theoretic and graph-theoretic formulations more precise.
Let . Let .
Let the joint distribution be . The support of this distribution is the set .
We can view this support set as the edge set of a bipartite graph .
The question asks for large sets such that . In graph terms, this is precisely a complete bipartite subgraph (also called a biclique), .
The condition "large" or "relatively dense" usually means sets of exponential size, e.g., and for some constant , or at least a size that is more than polynomial in .
Now, let's analyze the conditions on :
Uniform Marginals: and for all .
Small Mutual Information:.
Recall the formula for mutual information:
.
Since and are uniform over , we have .
So, the condition becomes:
.
The entropy is maximized when the distribution is as "spread out" or uniform as possible. If we assume is uniform on its support , i.e., for , then .
The condition then implies:
.
This means the support graph must be very dense, with a fraction of at least of all possible edges.
Furthermore, if is uniform on its support , the uniform marginal condition implies that all vertices have the same degree, . Similarly, all vertices in have the same degree. Thus, must be a regular bipartite graph.
So, the question can be simplified (under the assumption of a uniform distribution on the support) to:
Does every -regular bipartite graph on vertices () with edge density at least contain a biclique where and are "large" (e.g., exponential in )?
2. The Random Graph Counterexample
The answer to this rephrased question is no. The standard counterexample comes from the theory of random graphs. A random graph with high density is, in fact, one of the worst places to look for large, highly structured subgraphs like cliques or bicliques.
Consider a random bipartite graph where each of the possible edges is included independently with probability .
Density: The expected number of edges is , which matches our density requirement.
Largest Biclique: It's a classic result in extremal combinatorics that the size of the largest biclique in such a random graph is, with high probability, .
.
So, a typical graph with this density has no biclique larger than roughly . This is polynomially large in , but it is infinitesimally small compared to the size of the vertex sets (). It does not satisfy the "relatively dense" or "large" criteria in the context of sets in .
3. Constructing a Formal Counterexample
We can make this argument rigorous by constructing a specific distribution that satisfies the premises but whose support contains no large product set.
Choose a Graph: There exist -regular bipartite graphs on (with ) where that do not contain any biclique for for some constant . A randomly chosen graph from the uniform distribution over all such regular graphs will have this property with high probability. Let be one such graph.
Define the Distribution: Define the joint probability distribution to be uniform on the edge set of this graph :
Verify the Premises:
Support: The support of is .
Marginals: Since is -regular, . For any :
.
Similarly, . The marginals are uniform.
Mutual Information:.
The mutual information is exactly .
Check the Conclusion:
The question is whether the support contains a large product set . By our choice of the graph , its edge set does not contain any biclique (product set) larger than roughly . This size is not "large" or "relatively dense" in the way required.
Conclusion
While small mutual information implies that the joint distribution is "close" to the product of its marginals in a KL-divergence sense, this does not translate into strong structural properties of its support set, like the existence of large rectangular (product) substructures. The counterexample shows that a distribution can have very low mutual information while its support is "locally sparse" or "random-like," specifically avoiding large bicliques.
The Kővári–Sós–Turán theorem from extremal graph theory does guarantee that a graph with this density must contain some biclique, and it gives a lower bound on the size, which happens to be of a similar polynomial order (). So small bicliques are guaranteed, but large, exponentially-sized ones are not.
Quick Feedback
Was this answer helpful?
Additional Information:
version: June 2025 release
UQ Validation
Status: UQ Validated
Validated: 8 months ago
Human Verification
Status: Needs Human Verification
UQ Validation
Validation from UQ Validators
Model Reviews (0)
Reviews and evaluations from AI model experts
Loading model reviews...
Human Reviews (0)
Expert reviews and evaluations from human reviewers