https://catonmat.net/unix-utilities-pipe-viewer
[catonmat-l]
* archive
* books
* tools
* projects
* about
Pipe Viewer - A Unix Utility You Should Know About
Last updated 6 weeks ago
Unix UtilitiesHi all! I'm starting a new article series here. This
one is going to be about Unix utilities that you should know about.
The articles will discuss one Unix program at a time. I'll try to
write a good introduction to the tool and give as many examples as I
can think of.
The first post in this series is going to be about a not so well
known but super powerful Unix program called Pipe Viewer or pv for
short. Pipe viewer is a terminal-based tool for monitoring the
progress of data through a pipeline. It can be inserted into any
normal pipeline between two processes to give a visual indication of
how quickly the data is passing through, how long it has taken, how
near to completion it is, and an estimate of how long it will be
until completion.
Pipe viewer is written by Andrew Wood, an experienced Unix sysadmin.
The homepage of pv utility is here: pv utility.
If you feel like you are interested in this stuff, I suggest that you
subscribe to my rss feed to receive my future posts automatically.
How to use pv?
Let's start with some really easy examples and progress to more
complicated ones.
Suppose that you have a file access.log that is a tens of gigabytes
in size and contains web logs. You want to compress it into a smaller
file, let's say a gunzip archive (.gz). The obvious way to do it is:
$ gzip -c access.log > access.log.gz
As the file is so huge (tens of gigabytes), you have no idea how long
to wait. Will it finish soon? Or will it take another 30 mins?
By using pv you can precisely time how long it will take:
$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=> ] 15% ETA 0:00:59
Pipe viewer acts as cat here, except it also adds a progress bar. We
can see that gzip processed 611MB of data in 11 seconds. It has
processed 15% of all data and it will take 59 more seconds to finish.
So no coffee break.
You can stick several pv processes in your pipeline. For example, you
can time how fast the data is being read from the disk with one pv
and how much data has been gzipped via a second pv:
$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source: 760MB 0:00:15 [37.4MB/s] [=> ] 19% ETA 0:01:02
gzip: 34.5MB 0:00:15 [1.74MB/s] [ <=> ]
Here we have specified the -N parameter to pv to create a named
stream. The -c parameter makes sure the output is not garbaged by one
pv process writing over the other.
This example shows that the access.log file is being read at the
speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can
immediately calculate the compression rate. It's 37.4/1.74 = 21x!
Notice how gzip doesn't include how much data is left or how fast it
will finish. It's because the pv process after gzip has no idea how
much data gzip will produce (it's just outputting compressed data
from input stream). The first pv process, however, knows how much
data is left, because it's reading it from a known file.
Another similar example is be to pack the whole directory of files
into a compressed tarball:
$ tar -czf - . | pv > out.tgz
117MB 0:00:55 [2.7MB/s] [> ]
In this example, pv only shows the output rate of the tar -czf
command. It has no information about how bit the directory is or how
long the tar process will run or how much data is left. We need to
provide the total size of data we are tarring to pv. It can be done
this way:
$ tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
253MB 0:00:05 [46.7MB/s] [> ] 1% ETA 0:04:49
What happens here is we tell tar to recursively (default mode) create
(-c argument) an archive of all files in current dir (. argument) and
output the data to stdout -f - argument. Next, we specify the size -s
argument to pv of all files in current dir and all its
subdirectories. The du -sb . | awk '{print $1}' command returns
number of bytes in current dir and it's fed as the -s parameter to
pv. Next, we gzip the content and output the result to out.tgz file.
This way pv knows how much data is still left to be processed and
shows us that it will take another 4 mins 49 secs to finish. So you
can take a quick coffee break.
Another interesting example is copying large amounts of data over the
network via the nc (netcat) utility that I will write about some
other time.
(Update: Just wrote about it: Netcat - A Unix Utility You Should Know
About.)
Suppose you have two computers A and B. You want to transfer a
directory from A to B very quickly. The fastest way to do it is to
use tar and nc, and time the operation with pv.
On computer A with IP address 192.168.1.100 run this command:
$ tar -cf - /path/to/dir | pv | nc -l -p 6666 -q 5
On computer B run this command:
$ nc 192.168.1.100 6666 | pv | tar -xf -
That's it! All the files in /path/to/dir on computer A will get
transferred to computer B, and you'll be able to see how fast the
operation is going.
This will show how fast the data is being transferred but it won't
show how much data is left. If you want this information, then you
have to do the pv -s $(...) trick from the previous example and add
it to pv on computer A.
Here's another fun example. It shows how fast the computer reads from
/dev/zero:
$ pv /dev/zero > /dev/null
157GB 0:00:38 [4.17GB/s]
That's it. I hope you enjoyed this post and learned something new. I
love explaining things and teaching!
How to install pv?
If you're on Debian or Debian based system such as Ubuntu do the
following:
$ sudo aptitude install pv
If you're on Fedora or Fedora based system such as CentOS do:
$ sudo yum install pv
If you're on Mint, do:
$ sudo apt-get install pv
If you're on Slackware, go to pv homepage, download the
pv-version.tar.gz archive and do:
$ tar -zxf pv-version.tar.gz
$ cd pv-version
$ ./configure && sudo make install
If you're a Mac user:
$ sudo port install pv
If you're OpenSolaris user:
$ pfexec pkg install pv
If you're a Windows user on Cygwin:
$ ./configure
$ export DESTDIR=/cygdrive/c/cygwin
$ make
$ make install
The manual of the utility can be found here man pv.
Have fun measuring your pipes with pv and until next time!
Read more articles -
Thanks for reading my post. If you enjoyed it and would like to
receive my posts automatically, you can subscribe to new posts via
rss feed or email.
[ ] Subscribe
MIT's Introduction to Algorithms, Lectures 17, 18 and 19: Shortest
Path Algorithms
Matchit.vim - A Vim Plugin You Should Know About
[this-space]
Secret message: Use coupon code JELLYLING to get a discount at my
company Browserling!