https://susam.net/blog/sorting-in-emacs.html
Sorting in Emacs
By Susam Pal on 09 Aug 2023
In this article, we will perform a series of hands-on experiments
that demonstrate the various Emacs commands that can be used to sort
text in different ways. There is sufficient documentation available
for these commands in the Emacs and Elisp manuals. In this article,
however, we will take a look at some concrete examples to illustrate
how they work.
Sorting Lines
Our first set of experiments demonstrates different ways to sort
lines. Follow the steps below to perform these experiments.
1. First create a buffer that has the following text:
Carol 200 London LHR->SFO
Dan 20 Tokyo HND->LHR
Bob 100 London LCY->CDG
Alice 10 Paris CDG->LHR
Bob 30 Paris ORY->HND
Let us pretend that each line is a record that represents some
details about different persons. From left to right, we have each
person's name, some sort of numerical ID, their current location,
and their upcoming travel plan. For example, the first line says
that Carol from London is planning to travel from London Heathrow
(LHR) to San Francisco (SFO).
2. Type C-x h to mark the whole buffer and type M-x sort-lines RET
to sort lines alphabetically. The buffer looks like this now:
Alice 10 Paris CDG->LHR
Bob 100 London LCY->CDG
Bob 30 Paris ORY->HND
Carol 200 London LHR->SFO
Dan 20 Tokyo HND->LHR
3. Type C-x h followed by C-u M-x sort-lines RET to reverse sort
lines alphabetically. The key sequence C-u specifies a prefix
argument that indicates that a reverse sort must be performed.
The buffer looks like this now:
Dan 20 Tokyo HND->LHR
Carol 200 London LHR->SFO
Bob 30 Paris ORY->HND
Bob 100 London LCY->CDG
Alice 10 Paris CDG->LHR
4. Type C-x h followed by M-x sort-fields RET to sort the lines by
the first field only. Fields are separated by whitespace. Note
that the result now is slightly different from the result of M-x
sort-lines RET presented in point 2 earlier. Here Bob from Paris
comes before Bob from London because the sorting was performed by
the first field only. The sorting algorithm ignored the rest of
each line. However in point 2 earlier, Bob from London came
before Bob from Paris because the sorting was performed by entire
lines.
Alice 10 Paris CDG->LHR
Bob 30 Paris ORY->HND
Bob 100 London LCY->CDG
Carol 200 London LHR->SFO
Dan 20 Tokyo HND->LHR
5. Type C-x h followed by M-2 M-x sort-fields RET to sort the lines
alphabetically by the second field. The key sequence M-2 here
specifies a numeric argument that identifies the field we want to
sort by. Note that 100 comes before 20 because we performed an
alphabetical sort, not numerical sort. The result looks like
this:
Alice 10 Paris CDG->LHR
Bob 100 London LCY->CDG
Dan 20 Tokyo HND->LHR
Carol 200 London LHR->SFO
Bob 30 Paris ORY->HND
6. Type C-x h followed by M-2 M-x sort-numeric-fields RET to sort
the lines numerically by the second field. The result looks like
this:
Alice 10 Paris CDG->LHR
Dan 20 Tokyo HND->LHR
Bob 30 Paris ORY->HND
Bob 100 London LCY->CDG
Carol 200 London LHR->SFO
7. Type C-x h followed by M-3 M-x sort-fields RET to sort the lines
alphabetically by the third field containing city names. The
result looks like this:
Bob 100 London LCY->CDG
Carol 200 London LHR->SFO
Alice 10 Paris CDG->LHR
Bob 30 Paris ORY->HND
Dan 20 Tokyo HND->LHR
Note that we cannot supply the prefix argument C-u to this
command to perform a reverse sort by a specific field because the
prefix argument here is used to identify the field we need to
sort by. If we do specify the prefix argument C-u, it would be
treated as the numeric argument 4 which would sort the lines by
the fourth field. However, there is a little trick to reverse
sort lines by a specific field. The next point shows this.
8. Type C-x h followed by M-x reverse-region RET. This reverses the
order of lines in the region. Combined with the previous command,
this effectively reverse sorts the lines by city names. The
result looks like this:
Dan 20 Tokyo HND->LHR
Bob 30 Paris ORY->HND
Alice 10 Paris CDG->LHR
Carol 200 London LHR->SFO
Bob 100 London LCY->CDG
9. Type C-x h followed by M-- M-2 M-x sort-fields RET to sort the
lines alphabetically by the second field from the right (third
from the left). Note that the first two key combinations are
meta+- and meta+2. They specify the negative argument -2 to sort
the lines by the second field from the right. The result looks
like this:
Carol 200 London LHR->SFO
Bob 100 London LCY->CDG
Bob 30 Paris ORY->HND
Alice 10 Paris CDG->LHR
Dan 20 Tokyo HND->LHR
10. Type M-< to move the point to the beginning of the buffer. Then
type C-s London RET followed by M-b to move the point to the
beginning of the word London on the first line. Now type C-SPC to
set a mark there.
Then type C-4 C-n C-e to move the point to the end of the last
line. An active region should be visible in the buffer now.
Finally type M-x sort-columns RET to sort the columns bounded by
the column positions of mark and point (i.e., the last two
columns). The result looks like this:
Bob 100 London LCY->CDG
Carol 200 London LHR->SFO
Alice 10 Paris CDG->LHR
Bob 30 Paris ORY->HND
Dan 20 Tokyo HND->LHR
11. Like before, type M-< to move the point to the beginning of the
buffer. Then type C-s London RET followed by M-b to move the
point to the beginning of the word London on the first line. Now
type C-SPC to set a mark there.
Again, like before, type C-4 C-n C-e to move the point to the end
of the last line. An active region should be visible in the
buffer now.
Now type C-u M-x sort-columns RET to reverse sort the last two
columns.
Dan 20 Tokyo HND->LHR
Bob 30 Paris ORY->HND
Alice 10 Paris CDG->LHR
Carol 200 London LHR->SFO
Bob 100 London LCY->CDG
12. Warning: This step shows how not to use the sort-regexp-fields
command. In most cases you probably do not want to do this. The
next point shows a typical usage of this command that is correct
in most cases.
Type C-x h followed by M-x sort-regexp-fields RET [A-Z]*->\(.*\)
RET \1 RET to sort by the destination airport. This command first
matches the destination aiport in each line in a regular
expression capturing group (\(.*\)). Then we ask this command to
sort the lines by the field matched by this capturing group (\1).
The result looks like this:
Dan 20 Tokyo LCY->CDG
Bob 30 Paris ORY->HND
Alice 10 Paris HND->LHR
Carol 200 London CDG->LHR
Bob 100 London LHR->SFO
Observe how all our travel records are messed up in this result.
Now Dan from Tokyo is travelling from LCY to CDG instead of
travelling from HND to LHR. Compare the results in this point
with that of the previous point. This command has sorted the
destination fields fine and it has maintained the association
between the source airport and destination airport fine too. But
the association between the other fields (first three columns)
and the last field (source and destination airports) is broken.
This happened because the regular expression matches only the
last column and we sorted by only the destination field of the
last column, so the association of the fields in the last column
is kept intact but the rest of the association is broken. Only
the part of each line that is matched by the regular expression
moves around while the sorting is performed; everything else
remains unchanged. This behaviour may be useful in some limited
situations but in most cases, we want to keep the association
between all the fields intact. The next point shows how to do
this.
Now type C-/ (or C-x u) to undo this change and revert the buffer
to the previous good state. After doing this, the buffer should
look like the result presented in the previous point.
13. Assuming the state of the buffer is same as that of the result in
point 11, we will now see how to alter the previous step such
that when we sort the lines by the destination field, entire
lines move along with the destination fields. The trick is to
ensure that the regular expression matches entire lines. To do
so, we make a minor change in the regular expression. Type C-x h
followed by M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET.
Bob 100 London LCY->CDG
Bob 30 Paris ORY->HND
Dan 20 Tokyo HND->LHR
Alice 10 Paris CDG->LHR
Carol 200 London LHR->SFO
Now the lines are sorted by the destination field and Dan from
Tokyo is travelling from HND to LHR.
14. Type C-x h followed by M-- M-x sort-regexp-fields RET .*->\(.*\)
RET \1 RET to reverse sort the lines by the destination airport.
Note that the first key combination is meta+- here. This key
combination specifies a negative argument that results in a
reverse sort. The result looks like this:
Carol 200 London LHR->SFO
Dan 20 Tokyo HND->LHR
Alice 10 Paris CDG->LHR
Bob 30 Paris ORY->HND
Bob 100 London LCY->CDG
15. Finally, note that we can always invoke shell commands on a
region and replace the region with the output of the shell
command. To see this in action, first prepare the buffer by
typing M-< followed by C-k C-k C-y C-y to duplicate the first
line of the buffer.
Then type C-x h followed by C-u M-| sort -u to sort the lines but
remove duplicate lines during the sort operation. The M-| key
sequence invokes the command shell-command-on-region which
prompts for a shell command, executes it, and usually displays
the output in the echo area. If the output cannot fit in the echo
area, then it displays the output in a separate buffer. However,
if a prefix argument is supplied, say with C-u, then it replaces
the region with the output. As a result, the buffer now looks
like this:
Alice 10 Paris CDG->LHR
Bob 100 London LCY->CDG
Bob 30 Paris ORY->HND
Carol 200 London LHR->SFO
Dan 20 Tokyo HND->LHR
This particular problem of removing duplicates while sorting can
be also be accomplished by typing C-x h followed by M-x
sort-lines RET and then C-x h followed by M-x
delete-duplicate-lines. Nevertheless, it is useful to know that
we can execute arbitrary shell commands on a region.
Sorting Paragraphs and Pages
We have covered most of the sorting commands mentioned in the Emacs
manual in the previous section. Now we will switch gears and discuss
a few more of the remaining ones. We will no longer sort individual
lines but paragraphs and pages instead.
1. First create a buffer with the content provided below. Note that
the text below contains three form feed characters. In Emacs,
they are displayed as ^L. Many web browsers generally do not
display them. The ^L symbols that we see in the text below have
been overlayed with CSS. But there are actual form feed
characters next to those overlays. If you are viewing this post
with any decent web browser, you can copy the text below into
your Emacs and you should be able to see the form feed characters
in Emacs. In case you do not, insert them yourself by typing C-q
C-l.
Emacs is an advanced, extensible, customizable,
self-documenting editor.
Emacs editing commands operate in terms of
characters, words, lines, sentences, paragraphs,
pages, expressions, comments, etc.
We will use the term frame to mean a graphical
window or terminal screen occupied by Emacs.
At the very bottom of the frame is an echo area.
The main area of the frame, above the echo area,
is called the window.
The cursor in the selected window shows the
location where most editing commands take effect,
which is called point.
If you are editing several files in Emacs, each in
its own buffer, each buffer has its own value of
point.
2. Our text has six paragraphs spread across three pages. Each form
feed character represents a page break. Type C-x h followed by
M-x sort-pages RET to sort the pages alphabetically. Note how the
second page moves to the bottom because it begins with the letter
"W". The buffer now looks like this now:
Emacs is an advanced, extensible, customizable,
self-documenting editor.
Emacs editing commands operate in terms of
characters, words, lines, sentences, paragraphs,
pages, expressions, comments, etc.
The cursor in the selected window shows the
location where most editing commands take effect,
which is called point.
If you are editing several files in Emacs, each in
its own buffer, each buffer has its own value of
point.
We will use the term frame to mean a graphical
window or terminal screen occupied by Emacs.
At the very bottom of the frame is an echo area.
The main area of the frame, above the echo area,
is called the window.
3. Finally, type C-x h followed by M-x sort-paragraphs to sort the
paragraphs alphabetically. The buffer now looks like this now:
At the very bottom of the frame is an echo area.
The main area of the frame, above the echo area,
is called the window.
Emacs editing commands operate in terms of
characters, words, lines, sentences, paragraphs,
pages, expressions, comments, etc.
Emacs is an advanced, extensible, customizable,
self-documenting editor.
If you are editing several files in Emacs, each in
its own buffer, each buffer has its own value of
point.
The cursor in the selected window shows the
location where most editing commands take effect,
which is called point.
We will use the term frame to mean a graphical
window or terminal screen occupied by Emacs.
References
To read and learn more about the sorting commands described above
refer to the following resources:
* Emacs Manual: Sorting Text
* Elisp Manual: Sorting Text
Within Emacs, type the following commands to read these manuals:
* M-: (info "(emacs) Sorting") RET
* M-: (info "(elisp) Sorting") RET
Further, the documentation strings for these commands have useful
information too. Use the key sequence C-h f to look up the
documentation strings. For example, type C-h f sort-regexp-fields RET
to look up the documentation string for the sort-regexp-fields
command.
Comments
---------------------------------------------------------------------
Home Blog Feed Subscribe About GitHub Mastodon
(c) 2001-2023 Susam Pal