Density dichotomy in random words




Density, Free words, Limit theory


Word $W$ is said to encounter word $V$ provided there is a homomorphism $\phi$ mapping letters to nonempty words so that $\phi(V)$ is a substring of $W$. For example, taking $\phi$ such that $\phi(h)=c$ and $\phi(u)=ien$, we see that ``science'' encounters ``huh'' since $cienc=\phi(huh)$. The density of $V$ in $W$, $\delta(V,W)$, is the proportion of substrings of $W$ that are homomorphic images of $V$. So the density of ``huh'' in ``science'' is $2/{8 \choose 2}$. A word is doubled if every letter that appears in the word appears at least twice.

The dichotomy: Let $V$ be a word over any alphabet, $\Sigma$ a finite alphabet with at least 2 letters, and $W_n \in \Sigma^n$ chosen uniformly at random. Word $V$ is doubled if and only if $\mathbb{E}(\delta(V,W_n)) \rightarrow 0$ as $n \rightarrow \infty$.

We further explore convergence for nondoubled words and concentration of the limit distribution for doubled words around its mean.


[1] D. R. Bean, A. Ehrenfeucht, and G. F. McNulty, Avoidable Patterns in Strings of Symbols, Pac. J. of Math. 85:2 (1979), 261–294.

[2] J. P. Bell and T. L. Goh, Exponential lower bounds for the number of words of uniform length avoiding a pattern, Information and Computation 205 (2007), 1295–1306.

[3] F. Blanchet-Sadri and B. Woodhouse, Strict bounds for pattern avoidance, Theoretical Computer Science 506 (2013).

[4] J. Cooper and D. Rorabaugh, Bounds on Zimin Word Avoidance, Congressus Numerantium 222 (2014), 87–95.

[5] J. Cooper and D. Rorabaugh, Asymptotic Density of Zimin Words, Discrete Mathematics & Theoretical Computer Science 18:3#3 (2016).

[6] J. D. Currie, Pattern avoidance: themes and variations, Theoretical Computer Science 339 (2005).

[7] M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002.

[8] L. Lovász, Large Networks and Graph Limits, American Mathematical Society, Providence, 2012.

[9] J. Tao, Pattern occurrence statistics and applications to the Ramsey theory of unavoidable patterns, arXiv:1406.0450.

[10] A. Thue, Über unendliche Zeichenreihen, Norske Vid. Skrifter I Mat.-Nat. Kl., vol. 7, Kristiania, 1906.

[11] A. I. Zimin, Blokirujushhie mnozhestva termov, Mat. Sb. 119 (1982), 363–375.

[12] A. I. Zimin, Blocking sets of terms, Math. USSR-Sb. 47 (1984), 353–364.