Density dichotomy in random words

Joshua Cooper, Danny Rorabaugh

Abstract


Word $W$ is said to encounter word $V$ provided there is a homomorphism $\phi$ mapping letters to nonempty words so that $\phi(V)$ is a substring of $W$. For example, taking $\phi$ such that $\phi(h)=c$ and $\phi(u)=ien$, we see that ``science'' encounters ``huh'' since $cienc=\phi(huh)$. The density of $V$ in $W$, $\delta(V,W)$, is the proportion of substrings of $W$ that are homomorphic images of $V$. So the density of ``huh'' in ``science'' is $2/{8 \choose 2}$. A word is doubled if every letter that appears in the word appears at least twice.

The dichotomy: Let $V$ be a word over any alphabet, $\Sigma$ a finite alphabet with at least 2 letters, and $W_n \in \Sigma^n$ chosen uniformly at random. Word $V$ is doubled if and only if $\mathbb{E}(\delta(V,W_n)) \rightarrow 0$ as $n \rightarrow \infty$.

We further explore convergence for nondoubled words and concentration of the limit distribution for doubled words around its mean.

Keywords


Density; Free words; Limit theory

Full Text:

PDF

References


D. R. Bean, A. Ehrenfeucht, and G. F. McNulty, Avoidable Patterns in Strings of Symbols, Pac. J. of Math. 85:2 (1979), 261–294.

J. P. Bell and T. L. Goh, Exponential lower bounds for the number of words of uniform length avoiding a pattern, Information and Computation 205 (2007), 1295–1306.

F. Blanchet-Sadri and B. Woodhouse, Strict bounds for pattern avoidance, Theoretical Computer Science 506 (2013).

J. Cooper and D. Rorabaugh, Bounds on Zimin Word Avoidance, Congressus Numerantium 222 (2014), 87–95.

J. Cooper and D. Rorabaugh, Asymptotic Density of Zimin Words, Discrete Mathematics & Theoretical Computer Science 18:3#3 (2016).

J. D. Currie, Pattern avoidance: themes and variations, Theoretical Computer Science 339 (2005).

M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002.

L. Lovász, Large Networks and Graph Limits, American Mathematical Society, Providence, 2012.

J. Tao, Pattern occurrence statistics and applications to the Ramsey theory of unavoidable patterns, arXiv:1406.0450.

A. Thue, Über unendliche Zeichenreihen, Norske Vid. Skrifter I Mat.-Nat. Kl., vol. 7, Kristiania, 1906.

A. I. Zimin, Blokirujushhie mnozhestva termov, Mat. Sb. 119 (1982), 363–375.

A. I. Zimin, Blocking sets of terms, Math. USSR-Sb. 47 (1984), 353–364.




Contributions to Discrete Mathematics. ISSN: 1715-0868