\documentclass[12pt,reqno]{article} \usepackage[usenames]{color} \usepackage{amssymb} \usepackage{graphicx} \usepackage{amscd} \usepackage[colorlinks=true, linkcolor=webgreen, filecolor=webbrown, citecolor=webgreen]{hyperref} \definecolor{webgreen}{rgb}{0,.5,0} \definecolor{webbrown}{rgb}{.6,0,0} \usepackage{color} \usepackage{fullpage} \usepackage{float} \usepackage{psfig} \usepackage{graphics,amsmath,amssymb} \usepackage{amsthm} \usepackage{amsfonts} \usepackage{latexsym} \usepackage{epsf} \setlength{\textwidth}{6.5in} \setlength{\oddsidemargin}{.1in} \setlength{\evensidemargin}{.1in} \setlength{\topmargin}{-.5in} \setlength{\textheight}{8.9in} \newcommand{\seqnum}[1]{\href{http://oeis.org/#1}{\underline{#1}}} \DeclareMathOperator{\sgn}{sgn} \begin{document} \begin{center} \epsfxsize=4in \leavevmode\epsffile{logo129.eps} \end{center} \theoremstyle{plain} \newtheorem{theorem}{Theorem} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{proposition}[theorem]{Proposition} \theoremstyle{definition} \newtheorem{definition}[theorem]{Definition} \newtheorem{example}[theorem]{Example} \newtheorem{conjecture}[theorem]{Conjecture} \theoremstyle{remark} \newtheorem{remark}[theorem]{Remark} \begin{center} \vskip 1cm{\LARGE\bf Clustering Words and Interval Exchanges } \vskip 1cm \large S\'ebastien Ferenczi \\ IMPA\\ CNRS --- UMI 2924\\ Estrada Dona Castorina 110 \\ Rio de Janeiro, RJ 22460-32\\ Brazil \\ \href{mailto:ferenczi_sebastien@yahoo.fr}{\tt ferenczi\_sebastien@yahoo.fr}\\ \ \\ Luca Q. Zamboni \\ Institut Camille Jordan\\ Universit\'e Claude Bernard Lyon 1\\ 43 boulevard du 11 novembre 1918\\ F-69622 Villeurbanne Cedex \\ France \\ and \\ Department of Mathematics \\ and Turku Centre for Computer Science \\ University of Turku \\ 20014 Turku \\ Finland \\ \href{mailto:zamboni@math.univ-lyon1.fr}{\tt zamboni@math.univ-lyon1.fr} \end{center} \vskip .2 in \newcommand{\GN}{{\mathbb N}} \newcommand{\GR}{{\mathbb R}} \newcommand{\GZ}{{\mathbb Z}} \newcommand{\GT}{{\mathbb T}} \def \ind{_{n \in {\mbox{\rm {\scriptsize I$\!$N}}}}} \newcommand{\ab}{|} \begin{abstract} We characterize words which cluster under the Burrows-Wheeler transform as those words $w$ such that $ww$ occurs in a trajectory of an interval exchange transformation, and build examples of clustering words. \end{abstract} \section{Introduction} In 1994 Michael Burrows and David Wheeler \cite{bw} introduced a transformation on words which proved very powerful in data compression. The aim of the present note is to characterize those words which cluster under the Burrows-Wheeler transform, that is to say, those words that are transformed into expressions such as $4^a3^b2^c1^d$ or $2^a5^b3^c1^d4^e.$ Clustering words on a binary alphabet have already been extensively studied (see, for instance, \cite{jz,rr4}) and identified as particular factors of the Sturmian words. Some generalizations and partial characterizations to $r$ letters appear in Restivo and Rosone \cite{rr3}, but it had not yet been observed that clustering words are intrinsically related to a dynamical object called {\em interval exchange transformations} introduced in Oseledets \cite{ose}: we shall define them in Definitions \ref{d1} and \ref{d2} below, and refer the reader to \cite{v} which constitutes a classical course on general interval exchange transformations and contains many of the technical terms found in Section \ref{s3} below. This link comes essentially from the fact that the array of conjugates used to define the Burrows-Wheeler transform gives rise to a {\it discrete} interval exchange transformation sending its first column to its last column. It turns out that the converse is also true: interval exchange transformations generate clustering words. Indeed we prove that clustering words are exactly those words $w$ such that $ww$ occurs in a trajectory of an interval exchange transformation. On a binary letter alphabet, this condition amounts to saying that $ww$ is a factor of an infinite Sturmian word. We end the paper by some examples and questions on how to generate clustering words. This paper began during a workshop on board Via Rail Canada train number 2. We are grateful to Laboratoire International Franco-Qu\'eb\'ecois de Recherche en Combinatoire (LIRCO) for funding and Via for providing optimal working conditions. The second author is partially supported by a FiDiPro grant from the Academy of Finland. The authors owe much to Jean-Paul Allouche, the first mathematician who revealed to both of them the beauty of combinatorics on words, and who taught the first author that not every paper needs to begin with ``Let $(X,T,\mu)$ be a system ...''. \section{Definitions} Let $A=\{a_1