[HN Gopher] Namedtuple in a Post-Dataclasses World
___________________________________________________________________
Namedtuple in a Post-Dataclasses World
Author : genericlemon24
Score : 104 points
Date : 2021-07-21 14:28 UTC (8 hours ago)
(HTM) web link (death.andgravity.com)
(TXT) w3m dump (death.andgravity.com)
| msravi wrote:
| The syntax is a lot more intuitive in Julia:
|
| julia> point = (x=1, y=2)
|
| (x = 1, y = 2)
|
| julia> point.x
|
| 1
|
| julia> point.y
|
| 2
|
| julia> dump(point)
|
| NamedTuple{(:x, :y), Tuple{Int64, Int64}} x:
| Int64 1 y: Int64 2
| rdtsc wrote:
| I guess one difference is that when you inspect it, it doesn't
| indicate that it is a `point`, just that it's named tuple of
| two variables, so it's not exactly equivalent.
| bobbylarrybobby wrote:
| And to create one as an anonymous type, you can use the
| `@NamedTuple` macro: julia>
| @NamedTuple{x::Int, y::String} NamedTuple{(:x, :y),
| Tuple{Int64, String}}
| mdcfrancis wrote:
| they are also, type stable, strongly typed and have the same
| overhead as a struct. So an array of NamedTuples takes the
| minimal space and allocation.
| bionhoward wrote:
| Along this line of reasoning I learned to love Pydantic which can
| make it a breeze to parse and coerce environment variables to the
| correct types: https://pydantic-
| docs.helpmanual.io/usage/settings/
|
| You can make an env.py and override module __getattr__ and then
| import environment variables just like they're regular Python
| objects (even booleans, floats, collections, etc, despite .env
| files being string only
|
| Huge force multiplier for ML cuz then you can do hyperparameter
| optimization just by passing different environment variables in
| an outer loop (even inside your infrastructure as code)
|
| Edit: you can make these classes immutable too: https://pydantic-
| docs.helpmanual.io/usage/models/#faux-immut...
| ElevenPhonons wrote:
| You can also create CLI tools that can load partial or full
| "presets" defined in JSON.
|
| https://github.com/mpkocher/pydantic-cli
| jkrubin wrote:
| Pydantic is an incredible lib. I use it for so many different
| things on top of misc parsing.
| actually_a_dog wrote:
| Speaking of namedtuple, I would encourage anybody who uses Python
| and wants to learn a thing or two to read the source code for
| them. At least one of the things you learn should probably fall
| in the "what not to do" category. There's a lot going on in there
| to support all that magic you see from the outside, and it's a
| little scary in there.
| nikeee wrote:
| There is also a NamedTuple (notice the different casing) from the
| typing library, which doesn't seem to be mentioned in the
| article:
|
| https://docs.python.org/3/library/typing.html#typing.NamedTu...
| class Employee(NamedTuple): name: str id:
| int
|
| This is equivalent to: Employee =
| collections.namedtuple('Employee', ['name', 'id'])
| Nagyman wrote:
| It is mentioned if you expand the "In case you've never used
| them, here's a comparison." element.
| colejohnson66 wrote:
| I don't use Python much, but what's the difference between a
| NamedTuple and a regular class?
| Spivak wrote:
| I feel like none of the sibling answers actually answer your
| question which is "absolutely nothing." The function
| namedtuple is code generator that constructs a class
| definition and then eval()'s it.
|
| The reason you reach for it is because it's tedious to write
| the same methods over and over to get things like a nice
| repr, methods covert between dicts, or pickling support.
|
| The source from Python 3.6 is _much_ more readable than 3.9
| so I recommend reading that if you want to see how it works.
|
| https://github.com/python/cpython/blob/3.6/Lib/collections/_.
| ..
| masklinn wrote:
| Well in and of itself none, in the sense that anything a
| namedtuple can do you could do by hand (it really just
| defines a class). However namedtuple:
|
| * extends tuples, so a namedtuple is literally a tuple (which
| is useful)
|
| * sets up a bunch of properties for the "named fields", which
| are basically just names on the tuple elements
|
| * sets up a few other utility methods e.g. nice formatting,
| `_make`, `_asdict`, `_replace`
|
| Now the latter two are nice, and mostly replicated by
| dataclasses (or attrs). The first one is the _raison d 'etre_
| of namedtuples though: originally their purpose is to
| "upgrade" tuple return values into richer / clearer types
| e.g. urlparse originally returned a 6-utple which is not
| necessarily super wieldy / clear, you can probably infer that
| the 3rd element is the path but... after upgrading to
| namedtuple it's just `result.path which is usually much
| clearer.
|
| And because namedtuples are still classes in and of
| themselves, you can inherit from them to create a class with
| a `__dict__` with relative ease.
| agumonkey wrote:
| I think it was a pre-organized immutable data class in one
| line.
| akoumis wrote:
| NamedTuple is purely a data container. It does not have class
| functions or a constructor you can use for anything other
| than setting the data members.
| ZuLuuuuuu wrote:
| NamedTuple has the features of a tuple, for example it is
| immutable. A regular class is mutable.
| palotasb wrote:
| NamedTuple or namedtuple instances are tuple instances that
| have the same properties that regular tuples have. They are
| immutable (you cannot reassign their fields), you can index
| into them (a[0], a[1] instead of a.x and a.y), you can unpack
| them with *a. They can have methods like regular classes can,
| including additional @property methods. A NamedTuple class
| cannot inherit from another class, not even other named
| tuples.
| dec0dedab0de wrote:
| a named tuple works exactly like a tuple, except you can also
| use names to get the items in it.
|
| so it is immutable and you can get it via slicing.
| >>> from collections import namedtuple >>> Hat =
| namedtuple('Hat', ['style', 'size', 'color']) >>>
| my_hat = Hat('safari', 'XL', 'Orange') >>> my_hat
| Hat(style='safari', size='XL', color='Orange') >>>
| my_hat[0] 'safari' >>> my_hat.color
| 'Orange' >>> my_hat[1:] ('XL', 'Orange')
| >>> style, size, color = my_hat >>> size 'XL'
| >>>
|
| If thats all your using your classes for, then a named tuple
| is probably a better solution, or a dataclass. Though I
| normally just use dicts in that situation. If I see someone
| create a class without any methods, or atleast planned
| methods, I don't let it through code review.
|
| EDIT: Also, Raymond Hettinger created named tuples. I'm not
| normally one for call to authority, or hero worship, but I am
| a huge fan of his. I recommend that anyone interested in
| Python should watch as many of his talks as they can.
|
| EDIT2: As masklinn pointed out, another really good use of
| named tuples is when you're already returning a tuple, and
| you realize it would be better if it had names. You could
| change it to a named tuple without breaking any of the
| existing code. Unless they're doing something dumb like
| halfassing type checking at runtime. (this use case is in the
| article, which i didn't read at first)
| ziml77 wrote:
| I like that method of defining a named tuple much more. I
| understand why it's required, but having to duplicate the name
| of the named tuple always bothered me. The type hints are
| excellent to have too.
| Spivak wrote:
| I don't know what you're talking about, I always confuse my
| coworkers by doing Pointe =
| namedtuple('Point'...
| wizzwizz4 wrote:
| This breaks pickle, among other things.
| Spivak wrote:
| It really shouldn't. This is actually an example of
| pickle being broken.
|
| The class exists, there is an existing valid reference to
| it, and Python itself knows about it and the class itself
| knows its own name and namedtuple generates the right
| __getnewargs__. Everything is there for this to "just
| work" but pickle expects that every class object will
| have a reference to it of the same name which is kinda
| weird when you think about it.
|
| You can see it with this stupid little program.
| from collections import namedtuple import pickle
| def get_all_subclasses(cls): all_subclasses =
| [] for subclass in cls.__subclasses__():
| if subclass != type:
| all_subclasses.append(subclass)
| all_subclasses.extend(get_all_subclasses(subclass))
| return all_subclasses E =
| namedtuple('EEEEEEEEEEEEEEEEEEE', 'x') e =
| E(x='hello') for cls in
| get_all_subclasses(object): print(cls)
| pickle.dumps(e)
|
| You'll that the class it there and called the right
| thing! But pickle tries to look up a reference to it
| under __main__.
| mikepurvis wrote:
| That's awesome and I was not previously aware of it. Literally
| a decade ago I was asking on SO about exactly this kind of use-
| case [1] and at the time the answers were pretty unsatisfying;
| it's great to hear that the story is better now.
|
| [1]: https://stackoverflow.com/questions/4071765/in-python-how-
| do...
| the__alchemist wrote:
| Dataclasses and Enums have, since their introduction, taken over
| as my foundation of Python data structures. They've obsoleted
| NamedTuple, namedtuple, and traditional classes in my code.
|
| They're a clean way of defining how difference functions, methods
| etc should interact with each other.
| synergy20 wrote:
| what do you mean by 'traditional classes', dataclasses are more
| for data-only data structure with no methods?
| the__alchemist wrote:
| As in, classes that aren't tagged as `dataclass` or `Enum`.
| You can have methods with dataclasses.
| continuational wrote:
| Such a confusing title without "Python" in there, especially when
| it's specifically about some Python feature.
| PaulHoule wrote:
| In practice the hashtable implementation in Python is so fast,
| particularly for cases like {"x": 51.2, "y":
| -74.1 }
|
| that you don't gain a lot from namedtuples most of the time. I
| quit benching namedtuples for applications like that a long time
| ago.
| masklinn wrote:
| The main problem of dicts is that they're really heavy memory-
| wise, even with key-shared dicts and stuff.
|
| Your dict is 232 bytes, the equivalent tuple is 56.
| brandmeyer wrote:
| How much does automatic interning of string literals help out
| here?
| wizzwizz4 wrote:
| It doesn't. That's the size _assuming_ interning of string
| literals.
| brandmeyer wrote:
| Glurk! Thanks.
| heavenlyblue wrote:
| You could define __slots__ on the namedtuple that would not use
| a hashtable for lookups.
| johtso wrote:
| Immutability is nice though.
| wyldfire wrote:
| Also it's nice for bad references to trigger AttributeError
| which is almost always a design error, whereas KeyError is
| not always so.
|
| I haven't tried it but type hints are probably smart enough
| to find errors like this statically with namedtuple, but
| probably not so with a dict.
| gjulianm wrote:
| I think you need to use typing.NamedTuple to get typing
| support. On the other hand, you can use TypedDict [1] to
| get type hints on dictionaries.
|
| 1: https://www.python.org/dev/peps/pep-0589/
| [deleted]
| donio wrote:
| That's only because Python is so slow in general and everything
| else is implemented in terms of hashtable lookups too.
| toxik wrote:
| a.x is far easier to both read and write than a["x"] (67% is
| operator syntax compared to 33% in a.x).
| wcarss wrote:
| this and () => {} syntax have always been moderate to strong
| wins in my book for js when creative-coding
| ziml77 wrote:
| Not to mention that the first case gets assistance from
| autocomplete in an IDE or ipython session. That speeds up
| typing long, descriptive names so much.
| 6gvONxR4sf7o wrote:
| In cases where you don't care about immutability, I'd think of
| it as a better version of TypedDict (though TypedDict still has
| its place). It makes my IDE more helpful, makes my code more
| self-documenting, and allows mypy to tell me when I'm being
| dumb.
| jammycrisp wrote:
| Y'all may be interested in a fast dataclass-like library I
| maintain called msgspec (https://jcristharif.com/msgspec/) that
| provides many of the benefits of dataclasses (mutable, type
| declarations), but with speedy performance. The objects are
| mainly meant to be used for (de)serialization (currently only
| msgpack is supported, but JSON support is in the works), with
| native type validation (think a faster pydantic).
|
| Mirroring the author's initialization benchmark:
| In [1]: import msgspec In [2]: from typing import
| NamedTuple In [3]: class Point(msgspec.Struct):
| ...: x: int ...: y: int ...:
| In [4]: class PointNT(NamedTuple): ...: x: int
| ...: y: int ...: In [5]: %timeit
| Point(1, 2) 48.4 ns +- 0.195 ns per loop (mean +- std.
| dev. of 7 runs, 10000000 loops each) In [6]: %timeit
| PointNT(1, 2) 185 ns +- 0.851 ns per loop (mean +- std.
| dev. of 7 runs, 10000000 loops each)
| sayhar wrote:
| A little unrelated, but this brings up a question I've had for a
| while.
|
| Seems like one day, everyone around me was using dataclasses. I
| had not even heard of them. It felt like I had missed some memo
| or newsletter. It felt weird.
|
| Here's my question: what _should_ I have been reading / where
| should I have been "hanging out" online, so that I would have
| known that dataclasses were a thing? What are your go-tos for
| news about new language features, libraries that everyone is
| using, etc?
|
| Hacker news is great, but it doesn't quite fill that need for me,
| it seems.
| emddudley wrote:
| Python-centric forums, like r/Python:
|
| https://www.reddit.com/r/Python/
|
| I think the RealPython site is excellent for learning, even for
| mid to advanced users:
|
| https://realpython.com/
|
| They also have a great podcast:
|
| https://realpython.com/podcasts/rpp/
|
| Also just browsing the Python docs and release notes.
| lbhdc wrote:
| I read the release notes. I often see posts for releases here
| on HN or on reddit, but often I will check in on the official
| repos or websites to see whats new.
|
| I like to spend a few hours a week reading up on whats
| happening, or try something new to keep up. Checking out new
| language features is part of that processes to me.
| bobbylarrybobby wrote:
| For Python, you pretty much just need to be aware of when the
| new major version is released because the "what's new" pages
| are pretty good. Here's the one in which data classes were
| released: https://docs.python.org/3/whatsnew/3.7.html
| chaos_emergent wrote:
| Concretely, I found out about dataclasses by using
| [pydantic](https://pydantic-docs.helpmanual.io/) and seeing
| their drop-in `@dataclass` annotation - it got me curious about
| the adjacent stdlib class. I was using pydantic because I
| started using FastAPI to build a REST interface, which has
| pydantic deeply integrated.
|
| Generally, I find out about new features through PEP posts, and
| I reach those by seeing a keyword that I don't know in random
| code I read online
| dec0dedab0de wrote:
| I found out about data classes on hn, before they were in the
| standard library. I also regularly search for python to see
| what stories I missed.
|
| I also like to keep up to date with the PyCon videos, as well
| as some of the other python conferences. But, as others have
| said, the release notes are the main source for whats new, if a
| bit dry.
|
| That said, I never actually use data classes. I normally just
| use dicts, and occasionally named tuples.
| dknight10 wrote:
| A couple good newsletters are Python Weekly and PyCoder's
| Weekly. They each put out a mix of news, articles/tutorials,
| and interesting projects.
|
| https://www.pythonweekly.com/ https://pycoders.com/
| genericlemon24 wrote:
| Already duplicated in my reply to icegreentea2 below, but
| release notes for projects you use are a great place to get
| updates.
|
| For Python specifically, PEPs are very helpful too:
| https://www.python.org/dev/peps/
| icegreentea2 wrote:
| I follow the language specific sub-reddits, and I read release
| notes for major releases of languages (so for python that would
| be 3.X) even if I wasn't going to jump to the version set.
| genericlemon24 wrote:
| > I read release notes for major releases of languages
|
| _This_, so much.
|
| If you are a heavy user of a language / library, it's
| immensely helpful to look at the release notes every once in
| a while. Even if you don't plan to upgrade now, it gives you
| an idea of where things are going (and may eventually tip the
| scales to a "fuck it, it's now worth upgrading" moment).
|
| For Python specifically, PEPs are also a good way to keep
| track of what's happening (even if some of them don't get
| accepted): https://www.python.org/dev/peps/ ; there's also an
| RSS feed: view-
| source:https://www.python.org/dev/peps/peps.rss/
| sfvisser wrote:
| > Point = namedtuple('Point', 'x y')
|
| I must come from a different world. What is going on here?
|
| Are you dynamically creating a named tuple (a record?) by passing
| a space separated list of field names? Why?
| qbasic_forever wrote:
| It's effectively a 'macro' for metaprogramming.
| cdcarter wrote:
| They are dynamically creating a named tuple class (or
| prototype). The namedtuple implementation in the python
| standard library indeed accepts a space separated list of field
| names. Once it's been defined (as above) then you can create
| instances (records) like "Point(2,3)"
| masklinn wrote:
| > Are you dynamically creating a named tuple (a record?)
|
| No, it's creating a namedtuple _type_ , aka a subclass of
| `tuple`. So the fieldnames are literally the names given to the
| tuple's items: Point is a pair (a two-uple) whose 0th item can
| be accessed as `x` and 1st item `y`.
| selectnull wrote:
| I always wondered why is that a space separated string, when a
| list can work as well. The docs are not well written on that
| one. This works: Point = namedtuple('Point',
| ['x', 'y'])
| sgtnoodle wrote:
| It's a class factory function, so right off the bat it's a
| bit weird. The original intent of using spaces was probably
| to minimize typing. Since they're attribute names they can't
| have spaces in them, so it's a safe delimiter.
|
| You could imagine the function dynamically creates the class
| by manipulating the underlying dictionary (or whatever the
| "slot" alternative uses). At that level of python, attributes
| are strings anyway. Handling spaces is just a matter of
| calling .split().
|
| In modern python, there's a whole metaclass system that would
| possibly let you do the equivalent without getting your hands
| dirty with internal data structures.
| Pokepokalypse wrote:
| I envy coders who can actually save time by using a space
| as a delimiter instead of ['x', 'y']. I really have no use
| for such syntactic sugar.
| sgtnoodle wrote:
| Yeah, I think more time is wasted in confusion and
| arguing about style than is saved in keyboard strokes.
|
| There's definitely a class of persnickety coders out
| there though. As a technical leader within a growing
| organization, sometimes the bulk of my time spent in a
| code review turns into style guide enforcement. It can
| get old arguing about the subtle merits of someone's
| preferred but style violating syntax over and over,
| especially when all I care about is maintaining a
| standard of consistency.
| denimnerd42 wrote:
| syntactic sugar actually drives me nuts because it makes
| code harder to read for non experts
| framecowbird wrote:
| This kind of thing grates me: one thing I love about Python
| is that there is usually only one way of doing everything.
| dec0dedab0de wrote:
| I think it comes down to the idea that going out of your way
| to make the library work either way makes it easier for
| people to use, even if it makes the library itself a bit more
| complicated.
|
| I wish more library devs would go out of their way to add
| such niceties.
|
| A big one that I always do is if I'm expecting an iterator of
| objects I make it just work with one. from
| collections.abc import Iterable def my_function(arg):
| # slightly different if you're looking for a collection of
| strings or bytes if not isinstance(arg, Iterable):
| arg = [arg] for item in arg: do
| the thing
|
| Or if you have a specific type of object you want it goes
| like this from collections.abc import
| Iterable def my_function(arg): if
| isinstance(arg, MyObjectIWant): arg = [arg]
| for item in arg: do the thing
|
| I like to think of my libraries as mini programs for users,
| and I hate when validation is too strict, when it could be so
| easy to fix. Like when a phone number validator insists on
| (XXX)XXX-XXXX or XXX.XXX.XXXX or XXXXXXXXXX when it could
| just ignore everything that isn't a number and make sure
| there is 10 of them.
| goodside wrote:
| This sounds like a nice idea in theory, and makes a lot of
| sense for polished, publicly visible libraries where
| convenience trumps simplicity, but the edge cases can lead
| to confusing failures and bloat otherwise simple code -- as
| you noted, your example code appears to work for arbitrary
| objects but actually fails for `str` or `bytes`.
|
| A great case study in the issues here is Pandas, which
| routinely allows arguments to be columns, lists of columns,
| string column labels, lists of string column labels, and so
| on. It works surprisingly well, but at the cost of
| inventing a new semantic distinction between `list` objects
| and other sequence types like `tuple` -- someone unfamiliar
| with Pandas who thinks "Why does this need to be a list
| comprehension when a generator expression will do?" is
| likely introducing a bug.
|
| Another subtle issue is that code permissive with inputs is
| harder to extend via wrapper code. Suppose you have a
| function that does some sort of processing for any number
| of given datetimes, but also accepts integer seconds since
| 1970-01-01, a formatted date string, or any mixed sequence
| of these types. If you need to write a wrapper that first
| rounds all times to the most recent hour, your task is much
| easier if the only accepted type is `Iterable[datetime]`.
| xav0989 wrote:
| The function was defined to either take a space separated
| list of names or a sequence of names. The docs seem pretty
| clear to me:
|
| > The field_names are a sequence of strings such as ['x',
| 'y']. Alternatively, field_names can be a single string with
| each fieldname separated by whitespace and/or commas, for
| example 'x y' or 'x, y'.
| goodside wrote:
| I'd speculate it's meant to mimic Perl's `qw()` operator,
| which is like `str.split()` in Python. The module was
| originally written for contexts where you're processing SQL
| result sets with fixed schemas, and before Python these tasks
| were traditionally handled in Perl. Python inherits a lot of
| these loose traditions. Similarly, some parts of the
| standard-lib (`sys`, `os`) follow shell- or C-like naming
| conventions that would seem bizarre to someone who's never
| used a shell prompt.
| dragonwriter wrote:
| > I always wondered why is that a space separated string,
| when a list can work as well.
|
| Saves a bunch of typing. 5 chars: 'x y'
|
| vs 10: ['x', 'y']
| jeffdn wrote:
| With a non-trivial example that uses readable attribute
| names, a single, long, space-delimited string becomes more
| a burden than a convenience, I think. Also, the amount of
| time saved typing is miniscule in comparison to all the
| rest of the development work that'll happen.
| jhgb wrote:
| > Are you dynamically creating a named tuple (a record?) by
| passing a space separated list of field names? Why?
|
| I believe that's called "procedural record interface" in Scheme
| and it does have its uses, for example if you need to create
| records for data the structure of which you don't know in
| advance.
| wyldfire wrote:
| > Are you dynamically creating a named tuple (a record?) by
| passing a space separated list of field names? Why?
|
| Very little in python is bound statically. This is akin to a
| type definition. The type will behave as an ordered tuple that
| can be indexed but also alias these attribute names to those
| ordinals. assert(a == Point(a, b).x)
| assert(b == Point(a, b).y) assert(a == Point(a, b)[0])
| goodside wrote:
| After your example, `Point` is a class. You are not creating a
| record, you're creating the class that defines the schema of
| fields in any record. A class can be defined without using the
| `class foo:` syntax, and in this case the class is returned by
| the function `namedtuple` to produce an equivalent effect.
|
| As for the unusual space-delimited syntax, the missing context
| here is that namedtuple is a very, very old part of Python that
| predates the conventions now considered good style. Using space
| delimiters for lists of strings is a common idiom in Perl
| scripting due to the `qw()` quote syntax. Note the archetypical
| context where namedtuple was imagined to apply (record-oriented
| processing of logs and SQL result sets) was commonly handled
| using Perl before Python became dominant.
|
| Namedtuple is definitely the most prominent example of this
| syntax convention in Python, but other libraries use it too.
| `enum.Enum` supports a function-like interface directly modeled
| off namedtuple. It's a mildly bad idea to keep using it IMHO
| because it complicates static analysis or refactoring. If
| someone does a wide search-and-replace of a field name literal
| it's easy to miss this edge case.
| jhgb wrote:
| > a very, very old part of Python that predates the
| conventions now considered good style. Using space delimiters
| for lists of strings is a common idiom in Perl scripting due
| to the `qw()` quote syntax.
|
| Note that the Smalltalk-80 class creation method also expects
| a space-separated string of instance variable names, and
| that's an environment considerably older than Perl.
___________________________________________________________________
(page generated 2021-07-21 23:01 UTC)