
What's New In Python 3.1
************************

Author:
   Raymond Hettinger

Release:
   3.1a2

Date:
   April 04, 2009

This article explains the new features in Python 3.1, compared to 3.0.


PEP 372: Ordered Dictionaries
=============================

Regular Python dictionaries iterate over key/value pairs in arbitrary
order. Over the years, a number of authors have written alternative
implementations that remember the order that the keys were originally
inserted.  Based on the experiences from those implementations, the
``collections`` module now has an ``OrderedDict`` class.

The OrderedDict API is substantially the same as regular dictionaries
but will iterate over keys and values in a guaranteed order depending
on when a key was first inserted.  If a new entry overwrites an
existing entry, the original insertion position is left unchanged.
Deleting an entry and reinserting it will move it to the end.

The standard library now supports use of ordered dictionaries in
several modules.  The ``ConfigParser`` module uses them by default.
This lets configuration files be read, modified, and then written back
in their original order.  The ``collections`` module's
``namedtuple._asdict()`` method now returns an ordered dictionary with
the values appearing in the same order as the underlying tuple
indicies.  The ``json`` module is being built-out with an
*object_pairs_hook* to allow OrderedDicts to be built by the decoder.
Support was also added for third-party tools like PyYAML.

See also:

   **PEP 372** - Ordered Dictionaries
      PEP written by Armin Ronacher and Raymond Hettinger.
      Implementation written by Raymond Hettinger.


PEP 378: Format Specifier for Thousands Separator
=================================================

The builtin ``format()`` function and the ``str.format()`` method use
a mini-language that now includes a simple, non-locale aware way to
format a number with a thousands separator.  That provides a way to
humanize a program's output, improving its professional appearance and
readability:

   >>> format(Decimal('1234567.89'), ',f')
   '1,234,567.89'

The currently supported types are ``int`` and ``decimal.Decimal``.
Support for ``float`` is expected before the beta release. Discussions
are underway about how to specify alternative separators like dots,
spaces, apostrophes, or underscores.  Locale-aware applications should
use the existing *n* format specifier which already has some support
for thousands separators.

See also:

   **PEP 378** - Format Specifier for Thousands Separator
      PEP written by Raymond Hettinger; implemented by Eric Smith and
      Mark Dickinson.


Other Language Changes
======================

Some smaller changes made to the core Python language are:

* The ``int()`` type gained a ``bit_length`` method that returns the
  number of bits necessary to represent its argument in binary:

     >>> n = 37
     >>> bin(37)
     '0b100101'
     >>> n.bit_length()
     6
     >>> n = 2**123-1
     >>> n.bit_length()
     123
     >>> (n+1).bit_length()
     124

  (Contributed by Fredrik Johansson, Victor Stinner, Raymond
  Hettinger, and Mark Dickinson; issue 3439.)

* Added a ``collections.Counter`` class to support convenient counting
  of unique items in a sequence or iterable:

     >>> Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
     Counter({'blue': 3, 'red': 2, 'green': 1})

  (Contributed by Raymond Hettinger; issue 1696199.)

* Add a new module, ``ttk`` for access to the Tk themed widget set.
  The basic idea of ttk is to separate, to the extent possible, the
  code implementing a widget's behavior from the code implementing its
  appearance.

  (Contributed by Kevin Walzer and Guilherme Polo; issue 2618 and
  issue 2983.)

* The ``gzip.GzipFile`` and ``bz2.BZ2File`` classs now support the
  context manager protocol.

  (Contributed by Jacques Frechet; issue 4272.)

* The ``Decimal`` module now supports two new methods to create a
  decimal object that from a binary ``float``.  The conversion is
  exact but can sometimes be surprising:

     >>> Decimal.from_float(1.1)
     Decimal('1.100000000000000088817841970012523233890533447265625')

  The long decimal result shows the actual binary fraction being
  stored for *1.1*.  The fraction has many digits because *1.1* cannot
  be exactly represented in binary.

  (Contributed by Raymond Hettinger and Mark Dickinson.)

* The fields in ``format()`` strings can now be automatically
  numbered:

     >>> 'Sir {} of {}'.format('Gallahad', 'Camelot')
     'Sir Gallahad of Camelot'

  Formerly, the string would have required numbered fields such as:
  ``'Sir {0} of {1}'``.

  (Contributed by Eric Smith; issue 5237.)

* The ``itertools`` module grew two new functions.  The
  ``itertools.combinations_with_replacement()`` function is one of
  four for generating combinatorics including permutations and
  Cartesian products.  The ``itertools.compress()`` function mimics
  its namesake from APL.  Also, the existing ``itertools.count()``
  function now has an optional *step* argument and can accept any type
  of counting sequence including ``fractions.Fraction`` and
  ``decimal.Decimal``.

  (Contributed by Raymond Hettinger.)

* ``collections.namedtuple()`` now supports a keyword argument
  *rename* which lets invalid fieldnames be automatically converted to
  positional names in the form _0, _1, etc.  This is useful when the
  field names are being created by an external source such as a CSV
  header, SQL field list, or user input.

  (Contributed by Raymond Hettinger; issue 1818.)

* ``round`(x, n)`` now returns an integer if *x* is an integer.
  Previously it returned a float.

  (Contributed by Mark Dickinson; issue 4707.)

* The ``re.sub()``, ``re.subn()`` and ``re.split()`` functions now
  accept a flags parameter.

  (Contributed by Gregory Smith.)

* The ``runpy`` module which supports the ``-m`` command line switch
  now supports the execution of packages by looking for and executing
  a ``__main__`` submodule when a package name is supplied.

  (Contributed by Andi Vajda; issue 4195.)

* The ``pdb`` module can now access and display source code loaded via
  ``zipimport`` (or any other conformant **PEP 302** loader).

  (Contributed by Alexander Belopolsky; issue 4201.)

* ``functools.partial`` objects can now be pickled.

   (Suggested by Antoine Pitrou and Jesse Noller.  Implemented by Jack
   Diedrich; issue 5228.)

* Add ``pydoc`` help topics for symbols so that ``help('@')`` works as
  expected in the interactive environment.

  (Contributed by David Laban; issue 4739.)

* The ``unittest`` module now supports skipping individual tests or
  classes of tests. And it supports marking a test as a expected
  failure, a test that is known to be broken, but shouldn't be counted
  as a failure on a TestResult.

  (Contributed by Benjamin Peterson.)

* A new module, ``importlib`` was added.  It provides a complete,
  portable, pure Python reference implementation of the *import*
  statement and its counterpart, the ``__import__()`` function.  It
  represents a substantial step forward in documenting and defining
  the actions that take place during imports.

  (Contributed by Brett Cannon.)


Optimizations
-------------

Major performance enhancements have been added:

* The new I/O library (as defined in **PEP 3116**) was mostly written
  in Python and quickly proved to be a problematic bottleneck in
  Python 3.0. In Python 3.1, the I/O library has been entirely
  rewritten in C and is 2 to 20 times faster depending on the task at
  hand. The pure Python version is still available for experimentation
  purposes through the ``_pyio`` module.

  (Contributed by Amaury Forgeot d'Arc and Antoine Pitrou.)

* Added a heuristic so that tuples and dicts containing only
  untrackable objects are not tracked by the garbage collector. This
  can reduce the size of collections and therefore the garbage
  collection overhead on long-running programs, depending on their
  particular use of datatypes.

  (Contributed by Antoine Pitrou, issue 4688.)

* Enabling a configure option named ``--with-computed-gotos`` on
  compilers that support it (notably: gcc, SunPro, icc), the bytecode
  evaluation loop is compiled with a new dispatch mechanism which
  gives speedups of up to 20%, depending on the system, the compiler,
  and the benchmark.

  (Contributed by Antoine Pitrou along with a number of other
  participants, issue 4753).

* The decoding of UTF-8, UTF-16 and LATIN-1 is now two to four times
  faster.

  (Contributed by Antoine Pitrou and Amaury Forgeot d'Arc, issue
  4868.)

* The ``json`` module is getting a C extension to substantially
  improve its performance.  The code is expected to be added in-time
  for the beta release.

  (Contributed by Bob Ippolito.)

* Integers are now stored internally either in base 2**15 or in base
  2**30, the base being determined at build time.  Previously, they
  were always stored in base 2**15.  Using base 2**30 gives
  significant performance improvements on 64-bit machines, but
  benchmark results on 32-bit machines have been mixed.  Therefore,
  the default is to use base 2**30 on 64-bit machines and base 2**15
  on 32-bit machines; on Unix, there's a new configure option
  ``--enable-big-digits`` that can be used to override this default.

  Apart from the performance improvements this change should be
  invisible to end users, with one exception: for testing and
  debugging purposes there's a new ``structseq`` ``sys.int_info`` that
  provides information about the internal format, giving the number of
  bits per digit and the size in bytes of the C type used to store
  each digit:

     >>> import sys
     >>> sys.int_info
     sys.int_info(bits_per_digit=30, sizeof_digit=4)

  (Contributed by Mark Dickinson; issue 4258.)
