\documentstyle[11pt,leqno]{article} \input amssym.def \input amssym.tex \setlength{\textwidth}{6.2in} \setlength{\textheight}{9in} \setlength{\oddsidemargin}{.2in} \setlength{\topmargin}{-0.25in} \setlength{\headheight}{0in} \newcommand{\ep}{\epsilon} \newcommand{\oA}{\bar{A}} \newcommand{\spp}{\sum_{p \in P_j}} \newcommand{\sqq}{\sum_{q \in Q_j}} \newcommand{\dd}{\ldots} \newcommand{\hsp}{\hspace*{\parindent}} \newcommand{\sL}{{\cal L}} \newcommand{\ZZ}{{\Bbb Z}} \newcommand{\RR}{{\Bbb R}} \newcommand{\CC}{{\Bbb C}} \newcommand{\eeq}{\end{equation}} \newcommand{\beql}[1]{\begin{equation}\label{#1}} \newcommand{\eqn}[1]{(\ref{#1})} \newcommand{\la}{\lambda} \newcommand{\af}{\alpha} \newcommand{\bsq}{{\vrule height .9ex width .8ex depth -.1ex }} \def\binom#1#2{{#1}\choose{#2}} \newtheorem{coro}{Corollary} \newtheorem{cj}{Conjecture} \newtheorem{lemma}{Lemma} \newtheorem{theorem}{Theorem} \makeatletter % put a period after section or subsection number in header \def\@sect#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth \def\@svsec{}\else \refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname.\hskip .75em }\fi \@tempskipa #5\relax \ifdim \@tempskipa>\z@ \begingroup #6\relax \@hangfrom{\hskip #3\relax\@svsec}{\interlinepenalty \@M #8\par}% \endgroup \csname #1mark\endcsname{#7}\addcontentsline {toc}{#1}{\ifnum #2>\c@secnumdepth \else \protect\numberline{\csname the#1\endcsname}\fi #7}\else \def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname {#7}\addcontentsline {toc}{#1}{\ifnum #2>\c@secnumdepth \else \protect\numberline{\csname the#1\endcsname}\fi #7}}\fi \@xsect{#5}} % put a period after theorem and theorem-like numbers \def\@begintheorem#1#2{\it \trivlist \item[\hskip \labelsep{\bf #1\ #2.}]} \def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus -.2ex}{2.3ex plus .2ex}{\normalsize\bf}} \makeatother \pagestyle{myheadings} \markright{\sc the electronic journal of combinatorics 4, no. 2 (1997), \#R7\hfill} \thispagestyle{empty} \begin{document} \begin{center} {\Large{\bf Random walks on generating sets for finite groups}} \\ \vspace*{1\baselineskip} {\em F. R. K. Chung \footnote{Research supported in part by NSF Grant No. DMS 95-04834}} \\ University of Pennsylvania \\ Philadelphia, PA 19104 \\ \vspace*{1\baselineskip} {\em R. L. Graham} \\ AT\&T Research \\ Murray Hill, NJ 07974 \\ \vspace*{1\baselineskip} \vspace*{.5\baselineskip} {\small Submitted: August 31, 1996; Accepted: November 12, 1996}\\ \vspace*{.5\baselineskip} {\it Dedicated to Herb Wilf on the occasion of his sixty-fifth birthday} \vspace*{1.5\baselineskip} \end{center} \setlength{\baselineskip}{1.5\baselineskip} \begin{abstract} We analyze a certain random walk on the cartesian product $G^n$ of a finite group $G$ which is often used for generating random elements from $G$. In particular, we show that the mixing time of the walk is at most $c_r n^2 \log n$ where the constant $c_r$ depends only on the order $r$ of $G$. \end{abstract} \section{Introduction} \hsp One method often used in computational group theory for generating random elements from a given (non-trivial) finite group $G$ proceeds as follows (e.g., see \cite{CLMNO}). A fixed integer $n \ge 2$ is initially specified. Denote by $G^n$ the set $\{ (x_1, \dd, x_n) : x_i \in G, 1 \le i \le n \}$. If $\bar{x} = (x_1, \dd, x_n) \in G^n$, we denote by $\langle \bar{x} \rangle$ the subgroup of $G$ generated by $\{ x_i : 1 \le i \le n \}$. Let $G^\ast \subseteq G^n$ denote the set of all $\bar{x} \in G^n$ such that $\langle \bar{x} \rangle = G$. We execute a random walk on $G^\ast$ by taking the following general step. Suppose we are at a point $\bar{p} = (p_1, \dd, p_n) \in G^\ast$. Choose a random pair of indices $(i,j)$ with $i \neq j$. (Thus, each such pair is chosen with probability $\frac{1}{n(n-1)}$.) We then move to one of $\bar{p}' = (p'_1, \dd, p'_n )$ where $$p'_k = \left\{ \begin{array}{ll} p_i p_j ~~ {\rm or} ~~ p_i p_j^{-1} ~~ & \mbox{if $k=i$, each with probability 1/2} \\ [+.1in] p_k & \mbox{if $k \neq i$} ~. \end{array} \right. $$ This rule determines the corresponding transition matrix $Q$ of the walk. We note that with this rule, we always have $\bar{p} ' \in G^\ast$. It is also easy to check that for $n \ge n_0 (G)$, this walk is irreducible and aperiodic (see Section~5 for more quantitative remarks), and has a stationary distribution $\pi$ which is uniform (since $G^\ast$ is a multigraph in which every vertex has degree $2n(n-1)$). Starting from some fixed initial distribution $f_0$ on $G^\ast$, we apply this procedure some number of times, say $t$, to reach a distribution $f_0 Q^t$ on $G^\ast$ which we hope will be close to ``random'' when $t$ is large. A crucial question which must be faced in this situation is just how rapidly this process mixes, i.e., how large must $t$ be so that $f_0 Q^t$ is close to uniform. In this note, we apply several rather general comparison theorems to give reasonably good bounds on the mixing time for $Q$. In particular, we show (see Theorem~\ref{th1}) that when $t \ge c (G) n^2 \log n$, where $c(G)$ is a constant depending only on $G$, then $Q^t$ is already quite close to uniform (where we usually will suppress $f_0$). This problem belongs to a general class of random walk problems suggested recently by David Aldous \cite{aldous}. In fact, he considers a more general walk in which only certain pairs of indices $(i,j)$ are allowed in forming $p'_k = p_i p_j $ or $p_i p_j ^{-1}$. These pairs can be described by a graph $H$ on the vertex set $\{ 1,2, \cdot, n \}$. The case studied in this note corresponds to taking $H$ to be a complete graph. We first learned of this problem from a preprint of Diaconis and Saloff-Coste \cite{DS1}, part of which has subsequently appeared \cite{ds3}. In it, they wrote `` $ \cdots $ for $G= {\ZZ}_p$ with $p=2,3,4,5,7,8,9$ we know that $n^2 \log n$ steps are enough whereas for $G=\ZZ_6$ or ${\ZZ}_{10}$ we only know that $n^4 \log n$ are enough. Even in the case of ${\ZZ}_6$ it does not seem easy to improve this." Our main contribution in this note is to show that by direct combinatorial constructions, a mixing time of $c(G) n^2 \log n$ can be obtained for {\it all} groups $G$ where $c(G)$ is a constant depending just on $G$. Subsequently, they have now \cite{ds4} also obtained bounds of the form $c(G) n^2 \log n$ for all groups $G$ by including a more sophisticated path construction argument than they had previously used in \cite{DS1}. %The overall %structure of our proof is the same as theirs in \cite{DS1} %(although simpler and shorter since we are considering %only the special case when the underlying graph is complete, and we make %no attempt to minimize $c(G))$. \section{Background} \hsp A weighted graph $\Gamma = (V,E)$ consists of a vertex set $V$, and a weight function $w: V \times V \to \RR$ satisfying $w(u,v) = w(v,u) \ge 0$ for all $u,v \in V$. The edge set $E$ of $\Gamma$ is defined to be the set of all pairs $uv$ with $w(u,v) > 0$. A simple (unweighted) graph is just the special case in which all weights are 0 or 1. The {\em degree} $d_v$ of a vertex $v$ is defined by $$d_v : = \sum_u w(u,v) ~.$$ Further, we define the $|V| \times |V|$ matrix $L$ by $$L(u,v) = \left\{ \begin{array}{cl} d_v - w (v,v) & \mbox{if $u=v$}, \\ [+.05in] -w (u,v) & \mbox{if $uv \in E$, $u \neq v$}, \\ [+.05in] 0 & {\rm otherwise}~. \end{array} \right. $$ In particular, for a function $f: V \to \RR$, we have $$Lf(x) = \sum_{y \atop xy \in E} (f(x) - f(y) ) w(x,y) ~.$$ Let $T$ denote the diagonal matrix with the $(v,v)$ entry having the value $d_v$. The {\em Laplacian} $\sL_\Gamma$ of $\Gamma$ is defined to be $$\sL = \sL_\Gamma = T^{-1/2} L T^{-1/2} ~.$$ In other words, $$\sL (u,v) = \left\{ \begin{array}{cl} 1 - \frac{w(v,v)}{d_v} & \mbox{if $u=v$}, \\ [+.05in] - \frac{w(u,v)}{\sqrt{d_u d_v}} & \mbox{if $uv \in E$, $u \neq v$}, \\ [+.05in] 0 & {\rm otherwise} ~. \end{array} \right. $$ Since $\sL$ is symmetric and non-negative definite, its eigenvalues are real and non-negative. We denote them by $$0 = \la_0 \le \la_1 \le \cdots \le \la_{n-1}$$ where $n = |V|$. It follows from standard variational characterizations of eigenvalues that \beql{eq1} \la_1 = \inf_f \sup_c \frac{\sum\limits_{u,v \in E} (f(u) - f(v))^2 w(u,v)}{\sum\limits_x d_x (f(x) -c)^2} ~. \eeq For a connected graph $\Gamma$, the eigenvalues satisfy $$0 < \la_i \le 2$$ for $i \ge 1$. Various properties of the eigenvalues can be found in \cite{C}. Now, the usual random walk on an unweighted graph has transition probability $1/d_v$ of moving from a vertex $v$ to any one of its neighbors. The transition matrix $P$ then satisfies $$P(v,u) = \left\{ \begin{array}{cl} 1/d_v & \mbox{if $uv \in E$}, \\ [+.05in] 0 & {\rm otherwise} ~. \end{array} \right. $$ That is, $$fP(u) = \sum_{v \atop uv \in E} \frac{1}{d_v} f(v)$$ for any $f:V \to \RR$. It is easy to check that $$P= T^{-1/2} (I-R) T^{1/2} = T^{-1} A$$ where $A$ is the adjacency matrix of the graph. In a random walk on a connected weighted graph $\Gamma$, the transition matrix $P$ satisfies $${\bf 1} TP = {\bf 1} T~.$$ Thus, the stationary distribution is just ${\bf 1} T / vol ( \Gamma )$, where $vol ( \Gamma ) = \sum\limits_x d_x$ and {\bf 1} is the all ones vector. Our problem is to estimate how rapidly $fP^k$ converges to its stationary distribution, as $k \to \infty$, starting from some initial distribution $f: V \to \RR$. First, consider convergence in the $L^2$ (or Euclidean) norm. Suppose we write $$fT^{-1/2} = \sum_i a_i \phi_i$$ where $\phi_i$ denotes the eigenfunction associated with $\la_i$ and $\| \phi_i \| =1$. Since $\phi_0 = {\bf 1} \cdot T^{1/2} / \sqrt{vol ( \Gamma)}$ then $$a_0 = \frac{\langle fT^{-1/2}, {\bf 1} \,T^{1/2} \rangle}{\| {\bf 1} \, T^{1/2} \|} = \frac{1}{\sqrt{vol (\Gamma )}} $$ since $\langle f, {\bf 1} \rangle =1$. We then have \begin{eqnarray*} \| fP^s - {\bf 1} T/ vol ( \Gamma )\| & = & \| fT^{-1/2} (I - \sL )^s T^{1/2} - a_0 \phi_0 T^{1/2} \| \\ & = & \Biggl\| \sum_{i \neq 0} (1- \la_i)^s a_i \phi_i T^{1/2} \Biggr\| \\ & \le & (1- \la)^s \| f \| \\ & \le & e^{-s \la} \| f \| \end{eqnarray*} where $$\la = \left\{ \begin{array}{cl} \la_1 & \mbox{if $1- \la_1 \ge \la_{n-1} -1$}, \\ [+.1in] 2- \la_{n-1} & {\rm otherwise} ~. \end{array} \right. $$ So, after $s \ge (1/ \la ) \log ( 1/ \ep )$ steps, the $L_2$ distance between $fP^s$ and its stationary distribution is at most $\ep \| f \|$. Although $\la$ occurs in the above bound, in fact only $\la_1$ is crucial, in the following sense. If it happens that $1- \la_1 < \la_{n-1} -1$, then we can consider a random walk on the modified graph $\Gamma '$ formed by adding a loop of weight $c d_v$ to each vertex $v$ where $c = (\la_1 + \la_{n-1})/2 -1$. The new graph has (Laplacian) eigenvalues $\la'_k = \frac{1}{1+c} \la_k \le 1$, $0 \le k \le n-1$, so that $1- \la'_1 \ge \la'_{n-1} -1$. Consequently (see \cite{C}), we only need to increase the number of steps of this ``lazy'' walk on $\Gamma$ to $s \ge (1/( \la' ) \log (1/ \epsilon )$ to achieve that same $L_2$ bound on $\ep \| f \|$ where $\la'$ is $$\la' = \left\{ \begin{array}{cl} \la_1 & \mbox{if $1- \la_1 \ge \la_{n-1} -1$}, \\ [+.1in] \frac{ 2 \la_1}{\la_1+ \la_{n-1}} & {\rm otherwise} ~. \end{array} \right. $$ We note that we have $\la' \geq 2 \la_1 /(2+ \la_1) \geq 2 \la_1/3$. A stronger notion of convergence is measured by the $L_\infty$, or relative pointwise distance, which is defined as follows. After $s$ steps, the relative pointwise distance of $P$ to its stationary distribution $\pi$ is given by $$\Delta (s) : = \max_{x,y} \frac{ | P^s (y,x) - \pi (x)|}{\pi (x)} ~.$$ Let $\delta_z$ denote the indicator function defined by $$\delta_z (x) = \left\{ \begin{array}{ll} 1 & \mbox{if $x =z$}, \\ [+.05in] 0 & {\rm otherwise}~. \end{array} \right. $$ Set $$T^{1/2} \delta_x = \sum_i a_i \phi_i$$ and $$T^{-1/2} \delta_y = \sum_i \beta_i \phi_i~.$$ In particular, $$\alpha_0 = \frac{d_x}{\sqrt{vol ( \Gamma )}} ,~~~ \beta_0 = \frac{1}{\sqrt{vol ( \Gamma )}} ~. $$ Hence, \begin{eqnarray}\label{eq2} \Delta (t) & = & \max_{x,y} \frac{| \delta_y (P^t) \delta_y - \pi (x)|}{\pi (x)} \nonumber \\ & = & \max_{x,y} \frac{| \delta_y T^{-1/2} (I- \sL)^t T^{1/2} \delta_x - \pi (x)|}{\pi (x)} \nonumber \\ & \le & \max_{x,y} \sum_{i \neq 0} \frac{| (1- \la_i)^t \af_i \beta_i |}{d_x / vol ( \Gamma )} \\ & \le & (1- \la )^t \max_{x,y} \frac{\| T^{1/2} \delta_x \| \| T^{-1/2} \delta_y \|}{d_x / vol ( \Gamma )} \nonumber \\ & \le & (1- \la)^t \frac{vol ( \Gamma )}{\min\limits_{x,y} \sqrt{d_x d_y}} \nonumber \\ & \le & e^{-t \la} \frac{vol ( \Gamma )}{\min\limits_x d_x} ~. \nonumber \end{eqnarray} Thus, if we choose $t$ so that $$t \ge \frac{1}{\la} \log \frac{vol ( \Gamma )}{e \min\limits_x d_x}$$ then after $t$ steps, we have $\Delta (t) \le \epsilon$. We also remark that requiring $\Delta (t) \to 0$ is a rather strong condition. In particular, it implies that another common measure, the total variation distance $\Delta_{TV} (t)$ goes to zero just as rapidly, since \begin{eqnarray*} \Delta_{TV} (t) & = & \max_{A \subset V} \max_{y \in V} \left| \sum_{x \in A} P^t (y,x) - \pi (x) \right| \\ & \le & \max_{A \subset V \atop vol (A) \le \frac{1}{2} vol ( \Gamma )} \sum_{x \in A} \pi (x) \Delta (t) \\ & \le & \frac{1}{2} \Delta (t) ~. \end{eqnarray*} %It also implies that the so-called chi-squared distance %$$\Delta ' (t) : = \max_{y \in V} %\left( \sum_{x \in V} \frac{(P^t (y,x) - \pi (x))^2}{\pi (x)} \right)^{1/2} \to 0$$ %as well since $\Delta ' (t) \le \Delta (t)$ always holds. We point out here that the factor $\frac{vol ( \Gamma )}{ \min\limits_x d_x}$ can often be further reduced by the use of so-called logarithmic Sobolev eigenvalue bounds (see \cite{DS2} and \cite{C} for surveys). In particular, Diaconis and Saloffe-Coste have used these methods in their work on rapidly mixing Markov chains. We will follow their lead and apply some of these ideas in Section~4. \section{An eigenvalue comparison theorem} \hsp To estimate the rate at which $\Delta (t) \to 0$ as $t \to \infty$, we will need to lower bound $\la_1 ( \Gamma^\ast )$, the smallest non-zero Laplacian eigenvalue of the graph $\Gamma^\ast$ on $G^\ast$, defined by taking as edges all pairs $\bar{x} \bar{y} \in E^\ast$ where $\bar{x} \in G^\ast$ and $\bar{y}$ can be reached from $\bar{x}$ by taking one step of the process $Q$. Our comparison graph $\Gamma^n$ on $G^n$ will have all edges $\bar{x} \bar{y} \in E$ where $\bar{x}$ and $\bar{y}$ are any two elements of $G^n$ which differ in a single coordinate (so that $\Gamma^n$ is just the usual Cartesian product of $G$ with itself $n$ times). \begin{lemma} \label{le1} Suppose $\Gamma = (V,E)$ is a connected (simple) graph and $\Gamma ' = (V', E' )$ is a connected multigraph with Laplacian eigenvalues $\la_1 = \la_1 ( \Gamma )$ and $\la'_1 = \la_1 ( \Gamma ')$, respectively. Suppose $\phi : V \to V'$ is a surjective map such that: \begin{itemize} \item[(i)] If $d_x$ and $d'_{x'}$ denote the degrees of $v \in V$ and $x' \in V'$, respectively, then for all $x' \in V'$ we have $$\sum_{ x \in \phi^{-1} (x' )} d_x \ge a d'_{x'}~.$$ \item[(ii)] For each edge $e= xy \in E$ there is a path $P(e)$ between $\phi (x)$ and $\phi (y)$ in $E'$ such that: \begin{itemize} \item[(a)] The number of edges of $P(e)$ is at most $\ell$; \item[(b)] For each edge $e' \in E'$, we have $$| \{ xy \in E : e' \in P(e) | \le m ~.$$ \end{itemize} Then we have \end{itemize} \beql{eq3} \la'_1 \ge \frac{a}{\ell m} \la_1 \eeq \end{lemma} \paragraph{Proof.} For $h: V \to \CC$, define $h^2 : E \to \CC$ by setting $h^2 (e) = (h (x) - h(y))^2$ for $e= xy \in E$ (with a similar definition for $h: V' \to \CC$ and $h^2 : E' \to \CC$). We start by letting $g: V' \to \CC$ be a function achieving equality in \eqn{eq1} (or rather, the version of \eqn{eq1} for $\la'_1$). Define $f: V \to \CC$ by setting $$f(x) = g( \phi (x)) ~~~\mbox{for}~~~x \in V ~.$$ Thus, \begin{eqnarray}\label{eq4} \la'_1 & = & \sup_c \frac{\sum\limits_{e' \in E'} g^2 (e')}{\sum\limits_{v' \in V'} (g(v')-c)^2 d'_{v'}} \nonumber \\ & \ge & \frac{\sum\limits_{e' \in E'} g^2 (e')}{\sum\limits_{v' \in V'} (g(v')-c)^2 d'_{v'}} ~~~\mbox{for all $c$} \\ & = & \frac{\sum\limits_{e' \in E'} g^2 (e')}{\sum\limits_{e \in E} f^2 (e)} \cdot \frac{\sum\limits_{e \in E} f^2 (e)}{\sum\limits_{v \in V} (f(v)-c)^2 d_v} \cdot \frac{\sum\limits_{v \in V} (f(v) -c)^2 d_v}{\sum\limits_{v' \in V'} (g(v') -c)^2 d'_{v'}} \nonumber \\ & = & I \times II \times III ~. \nonumber \end{eqnarray} First, we treat factor $I$. Using Cauchy-Schwarz, we have for all $e \in E$, $$f^2 (e) \le \ell \sum_{e' \in P(e)} g^2 (e')$$ by (a). Hence by (b), $$m \sum_{e' \in E} g^2 (e') \ge \sum_{e \in E} \sum_{e' \in E'} g^2 (e') \ge \frac{1}{\ell} \sum_{e \in E} f^2 (e)$$ i.e., \beql{eq5} \frac{\sum\limits_{e' \in E'} g^2 (e')}{\sum\limits_{e \in E} f^2 (e)} \ge \frac{1}{ \ell m} \eeq which gives a bound for factor $I$. To bound factor $III$, we have \begin{eqnarray}\label{eq6} \sum_{x \in V} (f(x) - c)^2 d_x & = & \sum_{x' \in V'} \sum_{x \in \phi^{-1} (x')} (f(x) - c)^2 d_x \nonumber \\ & = & \sum_{x' \in V'} (g(x') - c)^2 \sum_{x \in \phi^{-1} (x')} d_x \\ & \ge & a \sum_{x' \in V'} (g(x') -c)^2 d'_{x'} ~~~~~~\mbox{by (i)} ~. \nonumber \end{eqnarray} Finally, for factor $II$ we choose $c_0$ so that \beql{eq7} \sup_c \frac{\sum\limits_{e \in E} f^2 (e)}{\sum\limits_{v \in V} (f(v) - c)^2 d_v} = \frac{\sum\limits_{e \in E} f^2 (e)}{\sum\limits_{v \in V} (f(v) - c_0)^2 d_v} \ge \la_1 \eeq by \eqn{eq1}. Hence, by \eqn{eq4}, \eqn{eq5}, \eqn{eq6} and \eqn{eq7} we have $$\la'_1 \ge \frac{a}{\ell m} \la_1$$ which is just \eqn{eq3}.~~~~$\bsq$ \vspace*{+.1in} \noindent Note that in the case that $\Gamma$ and $\Gamma '$ are regular with degrees $k$ and $k'$, respectively, then (i) holds with $a= k/k'$, and \eqn{eq3} becomes $$ \la'_1 \ge \frac{k}{k' \ell m} \la_1~. \leqno{(3')} $$ \section{A comparison theorem for the log-Sobolev constant} \hsp Given a connected weighted graph $\Gamma = (V,E)$, the log-Sobolev constant $\af = \af ( \Gamma )$ is defined by \beql{eq8} \af = \inf_{f \neq {\rm constant}} \frac{\sum\limits_{e \in E} f^2 (e)}{\sum\limits_x f^2 (x) d_x \log \frac{f^2 (x)}{\sum\limits_y f^2 (y) \pi (y)}} \eeq where $f$ ranges over all non-constant functions $f: V \to \RR$ and $\pi$ is the stationary distribution of the nearest neighbor random walk on $\Gamma$. In a recent paper \cite{DS2}, Diaconis and Saloffe-Coste show that \beql{eq9} \Delta_{TV} (t) \le e^{1-c} ~~{\rm if}~~ t \ge \frac{1}{2 \af} \log \log \frac{vol ( \Gamma )}{\min\limits_x d_x} + \frac{c}{\la_1} ~. \eeq %[Note: Is this for $\la$ or $\la_1$?] This is strengthened in \cite{C}, where the slightly stronger inequality is proved \beql{eq10} \Delta (t) \le e^{2-c} ~~{\rm if}~~ t \ge \frac{1}{2\af} \log \log \frac{vol ( \Gamma )}{\min\limits_x d_x} + \frac{c}{\la_1} \eeq and \beql{eq100} \Delta_{TV} (t) \le e^{1-c} ~~{\rm if}~~ t \ge \frac{1}{4\af} \log \log \frac{vol ( \Gamma )}{\min\limits_x d_x} + \frac{c}{\la_1} \eeq using the alternate (equivalent) definition: \beql{eq11} \af = \inf_{f \neq {\rm constant}} \frac{\sum\limits_{e \in E} f^2 (e)}{S(f)} \eeq where \beql{eq12} S(f) : = \inf_{c > 0} \sum_{x \in V} (f^2 (x) \log f^2 (x) - f^2 (x) - f^2 (x) \log c +c) d_x ~. \eeq While \eqn{eq10} is typically stronger than \eqn{eq2}, it depends on knowing (or estimating) the value of $\af$, which if anything is harder to estimate than $\la_1$ for general graphs. We can bypass this difficulty to some extent by the following (companion) comparison theorem for $\af$. Its statement (and proof) is in fact quite close to that of Lemma~\ref{le1}. \begin{lemma}\label{le2} Suppose $\Gamma = (V,E)$ is a connected (simple) graph and $\Gamma ' = (V', E')$ is a connected multigraph, with logarithmic Sobolev constants $\af = \af ( \Gamma )$ and $\af ' = \af ( \Gamma ' )$, respectively. Suppose $\phi : V \to V'$ is a surjective map such that (i), (ii) and (iii) of Lemma~\ref{le1} hold. Then \beql{eq13} \af ' \ge \frac{a}{\ell m} \af ~. \eeq \end{lemma} \paragraph{Proof:} Consider a function $g: V' \to \RR$ achieving equality in \eqn{eq13}. Define $f: V \to \RR$ as in the proof of Lemma~\ref{le1}. Then we have \begin{eqnarray}\label{eq14} \af ' & = & \frac{\sum\limits_{e' \in E'} g^2 (e')}{S(g)} \nonumber \\ & = & \frac{\sum\limits_{e' \in E'} g^2 (e')}{\sum\limits_{e \in E} f^2 (e)} \cdot \frac{\sum\limits_{e \in E} f^2 (e)}{S(f)} \cdot \frac{S(f)}{S(g)} \\ & = & I' \times II ' \times III' ~. \nonumber \end{eqnarray} Exactly as in the proof of Lemma~\ref{le1}, we obtain $$I' \ge \frac{1}{\ell m} , ~~~ II' \ge \af ~.$$ It remains to show $III' \ge a$ (which we do using a nice idea of Holley and Stroock; cf.~\cite{DS2}). First, define $$F ( \xi , \zeta ) : = \xi \log \xi - \xi \log \zeta - \xi + \zeta$$ for all $\xi$, $\zeta > 0$. Note that $F( \xi , \zeta ) \ge 0$ and for $\zeta > 0$, $F( \xi , \zeta )$ is convex in $\xi$. Thus, for some $c_0 > 0$, \begin{eqnarray*} S(f) & = & \sum_{x \in V} F( f^2 (x), c_0) d_x \\ & = & \sum_{x' \in V'} \left( \sum_{x \in \phi^{-1} (x')} d_x \right) F(g(x')^2 ) \\ & \ge & \sum_{x' \in V'} a d'_{x'} F(g(x')^2 ) ~~~{\rm since} ~F \ge 0 \\ & \ge & a \sum_{x '\in V'} F(g(x')^2 d'_{x'})~~~\mbox{by convexity} \\ & = & a S(g) ~. \end{eqnarray*} This implies $III' \ge a$ and \eqn{eq13} is proved.~~~$\bsq$ As in $(3')$, if $\Gamma$ and $\Gamma '$ are regular with degrees $k$ and $k'$, respectively, then $$ \af ' \ge \frac{k}{k' \ell m} \af~. \leqno{(13')} $$ \section{Defining the paths} \hsp In this section we describe the key path constructions for our proof. For our finite group $G$, we say that $B \subseteq G$ is a {\em minimal basis} for $G$ if $\langle B \rangle =G$ but for any proper subset $B' \subset B$, we have $\langle B' \rangle \neq G$. Define $$b(G) : = \max \{ |B| : ~ \mbox{$B$ is a minimal basis for $G$} \} ~.$$ Further, define $w(G)$ to be the least integer such that for any minimal basis $B$, and any $g \in G$, we can write $g$ as a product of at most $w$ terms of the form $x^{\pm 1}$, $x \in B$. Finally, define $s(G)$ to be the cardinality of a {\em minimum} basis for $G$. We abbreviate $b(G)$, $w(G)$ and $s(G)$ by $b$, $w$ and $s$, respectively, and, as usual, we set $r: = |G|$. In particular, the following crude bounds always hold: \beql{eq15} s \le b \le \frac{\log r}{\log 2} = \log_2 r ,~~~w < r ~. \eeq Let $R$ denote $\lfloor \log_2 r \rfloor$. We will assume $n > 2 (s+R)$. To apply Lemmas~\ref{le1} and \ref{le2}, we must define the map $\phi: \Gamma^n \to \Gamma^\ast$ and the paths $P(e)$, $e \in E^n$. Let $\{ g_1, \dd, g_s \}$ be a fixed minimum basis for $G$. For $\bar{x} = (x_1, \dd, x_n) \in \Gamma^n$, define $$\phi ( \bar{x} ) = \left\{ \begin{array}{cl} \bar{x} & \mbox{if $\langle \bar{x} \rangle = G$}, \\ [+.05in] (g_1 , \ldots , g_s , x_{s+1}, \ldots, x_n ) & \mbox{if $\langle \bar{x} \rangle \neq G$}~. \end{array} \right. $$ Next, for each edge $e= \bar{x} \bar{y} \in E^n$, we must define a path $P(e)$ between $\phi ( \bar{x} )$ and $\phi ( \bar{y} )$ in $\Gamma^\ast$. Suppose $\bar{x}$ and $\bar{y}$ just differ in the $i^{\rm th}$ component so that $$\bar{x} = (x_1, \ldots, x_i , \ldots, x_n ) ,~~\bar{y} = (y_1, \ldots , y_i , \ldots, y_n )$$ where $x_j = y_j$ for $j \neq i$, and $x_i \neq y_i$. There are three cases: \paragraph{(I)} $\mbox{\boldmath $\bar{x} \in G^\ast , ~ \bar{y} \in G^\ast$}$. Let $I$ denote interval $$\left\{ \begin{array}{ll} \{ i+1, \ldots, i+s +2R \} & \mbox{if $i \le n-s - 2R$}, \\ [+.05in] \{ n-s-2R , \ldots, n \} \setminus \{i \} & \mbox{if $i > n-s-2R$} ~. \end{array} \right. $$ Choose a subset $J \subset I$ so that: \begin{eqnarray*} {\rm (i)} && |J| =s \\ {\rm (ii)} && \langle \{ x_k : k \in [n] \setminus |J| \} \rangle =G \\ {\rm (iii)} && \langle \{ y_k : k \in [n] \setminus |J| \} \rangle =G ~. \end{eqnarray*} That is, the values $x_j = y_j$, $j \in J$, are not needed in generating $G$ using the $x_k$ or the $y_k$. Write $J$ as $\{j_1, j_2, \ldots, j_s \}$. In this case $\phi ( \bar{x} ) = \bar{x}$, $\phi ( \bar{y} ) = \bar{y}$. To form $P(e)$: \begin{itemize} \item[(i)] Use a basis from the elements $x_k$, $k \not\in J$, to change $x_{j_1}$ to $g_1$, $x_{j_2}$ to $g_2 , \ldots , x_{j_s}$ to $g_s$. This takes at most $ws$ steps; \item[(ii)] Next, use $g_1, \ldots, g_s$ to change $x_i$ to $y_i$. This takes at most $w$ steps; \item[(iii)] Finally, use a basis from the elements $y_k$, $k \not\in J$, to change $g_1$ back to $x_{j_1} = y_{j_1}, \ldots, g_s$ back to $x_{j_s} = y_{j_s}$. This takes at most $ws$ steps. Hence, for case (I), $P(e)$ has length at most $w(2s+1)$. \end{itemize} \paragraph{(II)} $\mbox{\boldmath $\bar{x} \not\in G^\ast , \bar{y} \in G^\ast$.}$ In this case, $\phi ( \bar{x} ) = (g_1, \ldots, g_s , x_{s+1} , \ldots , x_n )$, $\phi ( \bar{y} ) = (y_1, \ldots, y_n )$ where $x_j = y_j$ for $j \neq i$, and $x_i \neq y_i$. This time we locate a set $J$ of $s$ indices $j_1, \ldots, j_s$, with $i < j_1 < \cdots < j_s \le i+s+R$ so that $\langle \{ y_k : k \in [n] \setminus J \} \rangle =G$. If there is not enough room, i.e., $i > n-s-R$, then we locate $J$ to lie in $\{n-s -R, \ldots, n \} \setminus \{i\}$. In addition, if it happens that $i \le s$, then we take $J \subseteq \{ s+1, \ldots, 2s+R \}$. Now, to form $P(e)$: \begin{itemize} \item[(i)] Use $g_1, \ldots, g_s$ in $\phi ( \bar{x} )$ to change $x_{j_1}$ to $g_1$, $x_{j_2}$ to $g_2 , \ldots , x_{j_s}$ to $g_s$. \item[(ii)] Use the newly formed $g_1, \ldots, g_s$ (with indices in $J$) to change coordinate 1 from $g_1$ to $y_1$, coordinate 2 from $g_2$ to $y_2, \ldots$, coordinate $s$ from $g_s$ to $y_s$. Then change $x_i$ to $y_i$. \item[(iii)] Finally use a basis in $\{ y_k : k \not\in [n] \setminus J \}$ to change coordinates $j_1, \ldots, j_s$ to $y_{j_1} , \ldots , y_{j_s}$, respectively. In this case, the length of $P(e)$ is at most $w(3s+1)$. \end{itemize} \paragraph{III)} $\mbox{\boldmath $\bar{x} \not\in G^\ast , \bar{y} \not\in G^\ast$}.$ Thus, $\phi (\bar{x} ) = (g_1, \ldots, g_s , x_{s+1} , \ldots, x_n )$, $\phi ( \bar{y} ) = (g_1, \ldots, g_s , y_{s+1} , \ldots , y_n )$ and of course, $x_j = y_j$, $j \neq i$, and $x_i \neq y_i$. If $i > s$ then we can change $x_i$ to $y_i$ in at most $w$ steps (and this forms $P(e)$). If $i \le s$ then $\phi ( \bar{x} ) = \phi ( \bar{y} )$ and $P(e)$ does not have to be defined. The main point in the preceding slightly complicated construction is that it guarantees a rather small value of $m$. The reason is that the only coordinates $u$ which change in edges of $P(e)$ are either in $\{1,2, \dd, s \}$ or fairly close to $i$, e.g. $|i-u | \le 2(s+R)$. Furthermore, if a changing coordinate $u \in \{ 1, \ldots, s \}$ and $i > 2(s+R)$, so that we have some edge $e' = (z_1, \dd, z_s , \dd , z_u , \dd , z_n )$, $(z_1 , \dd, z_s , \dd, z'_u , \dd , z_n )$ in $\Gamma^\ast$, then we search $(z_1, \dd, z_n )$ for the first interval of length $2(s+R)$ which contains $g_1, \dd, g_s$, say $\{ w+1 , \dd, w+2 (s+R) \}$. By our construction, such an interval must exist. Furthermore, it is not hard to see that in this case $|i-w| < 4(s+R)$ (and this is somewhat generous). Consequently, the original point $\bar{x}$ in $e$ must agree with $(z_1, \dd, z_n )$ in all but at most $10 (s+R)$ coordinates. If follows that for these choices of $\phi$ and $P(e)$, $e \in E^n$, we have \beql{eq16} m \le 10 (s+R) r^{10 (s+R)} ~. \eeq Also by previous remarks, we have \beql{eq17} \ell \le w (3s+1)~. \eeq Observe that ${\rm deg} \,\Gamma^n = (r-1) n$ and ${\rm deg} \, \Gamma^\ast = 2n (n-1)$. Consequently, by $(3')$, $(13')$, \eqn{eq16} and \eqn{eq17} (after some simplifications) \beql{eq18} \la'_1 \ge \frac{\la_1}{1600R^2 r^{20R} n} , ~~ \af ' \ge \frac{\af}{1600R^2 r^{20R} n} ~. \eeq \section{Putting it all together} \hsp The final pieces we need to bound $\Delta (t)$ in \eqn{eq10} are the values of $\la_1 ( \Gamma^n )$ and $\af ( \Gamma^n)$. Fortunately, these are easy to derive since $\la_1$ and $\af$ behave very nicely under Cartesian products. In particular, we have \beql{eq19} \la_1 ( \Gamma^n ) = \af ( \Gamma^n ) = \frac{(r-1)}{r} \cdot \frac{1}{n} ~. \eeq Note that $$vol ( \Gamma^\ast ) \le vol ( \Gamma^n) = | \Gamma^n | \cdot 2n (n-1) ~.$$ Thus, by \eqn{eq10} we have $$\Delta (t) \le e^{2-c} ~~{\rm if}~~ t \ge \frac{1}{2 \af '} \log \log | \Gamma^n | + \frac{c}{\la'_1} ~ $$ and, by \eqn{eq100} we have $$\Delta_{TV} (t) \le e^{1-c} ~~{\rm if}~~ t \ge \frac{1}{4 \af '} \log \log | \Gamma^n | + \frac{c}{\la'_1} ~. $$ This implies \begin{theorem} \label{th1} \beql{eq20} \Delta (t) \le e^{2-c} ~~{\rm if}~~ t \ge 800R^2 r^{20R +1} n^2 ( \log n + \log \log r +c ) ~. \eeq \end{theorem} In other words, $c_r n^2 \log n$ steps are enough to force the distribution to be close to uniform, which is what we claimed in the Introduction. Also, we have \begin{theorem} \label{th2} \beql{eq200} \Delta_{TV} (t) \le e^{1-c} ~~{\rm if}~~ t \ge 400R^2 r^{20R +1} n^2 ( \log n + \log \log r +c ) ~. \eeq \end{theorem} \section{Concluding remarks} \hsp Of course, the preceding techniques using comparison theorems can be applied to many other random walk problems of this general type. For example, one could restrict the preceding moves so that $p_i \to p_i p_j^{\pm 1}$ is only allowed if $(i,j)$ belongs to some specified set (this determines an underlying digraph). It is probably true that the correct answer in \eqn{eq20} is actually $c_r n \log n$ (this is conjectured in \cite{DS1}). Some evidence in favor of this is our recent result in \cite{cg} that $O(n \log n)$ steps do suffice when $G={\ZZ}_2$. \begin{thebibliography}{99} \bibitem{aldous} David Aldous, talk at 1994 National IMS meeting. \bibitem{CLMNO} F. Celler, C. R. Leedham-Green, S. Murray, A. Niemeyer and E.~A. Obrien, Generating random elements of a finite group, Comm. Alg., 23 (1995), 4931--4948. \bibitem{C} F. R. K. Chung, Spectral Graph Theory, Amer. Math. Soc., Providence, RI (to appear 1996). \bibitem{cg} F. R. K. Chung and R. L. Graham, Stratified random walk on an $n$-cube, preprint. \bibitem{DGM} P. Diaconis, R. L. Graham and J. A. Morrison, Asymptotic analysis of a random walk on a hypercube with many dimensions, Random Structures and Algorithms, 1 (1990), 51--72. \bibitem{DS1} P. Diaconis and L. Saloffe-Coste, Walks on generating sets of group, Stanford Technical Report No. 481, July 1995, 36 pages. \bibitem{ds3} P. Diaconis and L. Saloffe-Coste, Walks on generating sets of abelian groups, Probab. Theory Relat. Fields, 105 (1996), 393-421. \bibitem{ds4} P. Diaconis and L. Saloffe-Coste, Walks on generating sets of groups, Stanford Technical Report No. 497, July 1996, 40 pages. Probab. Theory Relat. Fields, 105 (1996), 393-421. \bibitem{DS2} P. Diaconis and L. Saloffe-Coste, Logarithmic Sobolev inequalities for finite Markov chains (preprint 1995). \bibitem{DSh} P. Diaconis and M. Shashahani, Time to reach stationarity in the Bernoulli-Laplace diffusion model, SIAM J. Math. Anal., 18 (1987), 208--218. \end{thebibliography} \end{document} .