[HN Gopher] A Python dict that can report which keys you did not...
___________________________________________________________________
A Python dict that can report which keys you did not use
Author : gilad
Score : 71 points
Date : 2025-07-27 02:22 UTC (3 days ago)
(HTM) web link (www.peterbe.com)
(TXT) w3m dump (www.peterbe.com)
| jraph wrote:
| I did exactly the same thing in our Confluence to XWiki migrator
| to easily and automatically report which macro parameters we
| don't handle when converting Confluence macros to equivalent
| macros in XWiki.
|
| This can be used to evaluate the migration quality and spot what
| can be improved.
|
| https://github.com/xwiki-contrib/confluence/blob/7a95bf96787...
| IshKebab wrote:
| I think if you feel like you need this then it's a bit of a red
| flag and you should be using Pydantic or `dataclass` instead,
| then your IDE can statically tell you which fields you don't
| access (among many other benefits). Dicts are mainly for when you
| don't know the keys up front.
| mb7733 wrote:
| Static analysis could only tell you which fields are never
| used, across all usage of the class. Not on a given instance.
| taeric wrote:
| Counterpoint, something like this for dataclasses would also be
| very useful.
|
| That is, it isn't just knowing whether or not the data is ever
| used. It is useful to know if it was used in this specific run.
| And often times, seeing what parts of the data was not used is
| a good clue as to what went wrong. At the least, you can use it
| to rule out what code was not hit.
| ok123456 wrote:
| If you're inheriting from dict to extend its behavior, there are
| a lot of side effects with that, and it's recommended to use
| https://docs.python.org/3/library/collections.html#collectio...
| instead.
| quietbritishjim wrote:
| From right above where you linked to:
|
| > The need for this class has been partially supplanted by the
| ability to subclass directly from dict; however, this class can
| be easier to work with because the underlying dictionary is
| accessible as an attribute.
|
| Sounds like (unless you need the dict as a separate data
| member) this class is a historical artefact. Unless there's
| some other issue you know of not mentioned in the
| documentation?
| ok123456 wrote:
| dict doesn't follow the usual object protocol, and overloaded
| methods are runtime dependent. It's only guaranteed that non-
| overloaded methods are resolved least surprisingly.
| quietbritishjim wrote:
| I think you mean overridden (i.e. defined in both base
| class and derived class) rather than overloaded (i.e.
| defined more than once in a single place but with different
| argument types, as least from a typing point of view [1]).
| Your comment seriously confused me till I figured that out.
|
| [1] https://typing.python.org/en/latest/spec/overload.html
|
| Even then, to be honest I'm a bit sceptical. Can you point
| at a link in the official documentation that says
| overriding methods of dictionaries may not work? I would
| have thought the link to UserDict would have mentioned that
| if true. What do you mean they are "runtime dependent"?
| mont_tag wrote:
| No, that is _not_ the recommendation. People routinely and
| reliably inherit from dict.
|
| The UserDict class is mostly defunct and is only still in the
| standard library because there were a few existing uses that
| were hard to replace (such as avoiding base class conflicts in
| multiple inheritance).
| smcin wrote:
| UserDict is not formally deprecated but it will be someday,
| so code that relies on it is not future-proof.
| 9dev wrote:
| Ah, Python. The language where nobody agrees on the right way
| to do things, ans just does their own instead. Five ways to
| describe an object of a certain shape? Six package managers,
| with incompatible but overlapping ways to publish packages,
| but half of them without a simple way to update dependencies?
| Asynchronous versions of everything? Metaprogramming that
| makes Ruby blush? Yes! All of it! Lovely.
| boothby wrote:
| Just a heads up, this fails to track usage of _get_ and
| _setdefault_. The ability to iterate over dicts makes the whole
| question rather murky.
| quietbritishjim wrote:
| I didn't know about the setdefault method, and wouldn't have
| guessed it lets you read a value. Interesting, thanks.
|
| Another way to get data out would be to use the new | operator
| (i.e. x = {} | y essentially copies dictionary x to y) or the
| update method or ** unpacking operator (e.g. x = {**y}). But
| maybe those come under the umbrella of iterating as you
| mentioned.
| notatallshaw wrote:
| setdefault was a go to method before defaultdict was added to
| the collections module in Python 2.5, which replaced the
| biggest use case.
| boothby wrote:
| It's been some time since I last benchmarked defaultdict
| but last time I did (circa 3.6 and less?), it was
| considerably slower than judicious use of setdefault.
| quietbritishjim wrote:
| One time that defaultdict may come out ahead is if the
| default value is expensive to construct and rarely
| needed: d.setdefault(k, computevalue())
|
| defaultdict takes a factory function, so it's only called
| if the key is not already present: d =
| defaultdict(computevalue)
|
| This applies to some extent even if the default value is
| just an empty dictionary (as it often is in my
| experience). You can use dict() as the factory function
| in that case.
|
| But I have never benchmarked!
| masklinn wrote:
| > if the default value is expensive to construct and
| rarely needed:
|
| I'd say "or" rather than "and": defaultdict has higher
| overhead to initialise the default (especially if you
| don't need a function call in the setdefault call) but
| because it uses a fallback of dict lookup it's
| essentially free if you get a hit. As a result, either a
| very high redundancy with a cheap default or a low amount
| of redundancy with a costly default will have the
| defaultdict edge out.
|
| For the most extreme case of the former,
| d = {} for i in range(N):
| d.setdefault(0, [])
|
| versus d = defaultdict(list)
| for i in range(N): d[0]
|
| has the defaultdict edge out at N=11 on my machine (561ns
| for setdefault versus 545 for defaultdict). And that's
| with a literal list being quite a bit cheaper than a
| list() call.
| hackish wrote:
| Along with those and iteration, it also would need to handle
| del/pop/popitem/update/copy/or/ror/... some of which might
| necessitate a decision on whether comparisons/repr also count
| as access.
| rjmill wrote:
| Indeed. Inheriting from 'collections.UserDict' instead of
| 'dict' will make TFA's code work as intended for most of those
| edge cases.
|
| UserDict will route '.get', '.setdefault', and even iteration
| via '.items()' through the '__getitem__' method.
|
| _edited to remove "(maybe all?) edge cases". As soon as I
| posted, I thought of several less common/obvious edge cases._
| jgalt212 wrote:
| why not inside of __init__ self.accessed_keys =
| set()
|
| instead of @property def
| accessed_keys(self): return self._accessed_keys
| Jaxan wrote:
| With the @property you only get the "getter" and not the
| "setter".
| eurleif wrote:
| But that doesn't accomplish much, because you can still do:
| `d.accessed_keys.add('foo')`.
| larrik wrote:
| I actually wrote something similar in nodejs for a data import
| system. Was very handy.
| null_deref wrote:
| Interesting! Can you elaborate a little bit more on your
| implementation?
| larrik wrote:
| Mine was a bit more specific. I had a JSON object of data
| exported per account I was importing, and then a complex
| mapping (also JSON) of where to put each piece of data.
|
| Therefore, I really wanted to know that I was actually
| pulling in all of the data I needed, so I tracked what was
| seen vs not seen, and compared against what was attempted to
| see.
|
| In the end it was basically a wrapper around the JSON object
| itself, that allowed lookup of data via a string in "dot
| notation" (so you could do "keyA.key2" to get the same thing
| you would have directly in JSON. Then, it would either return
| a simple value (if there was one), or another instance of the
| wrapper if the result was itself an object (or an array or
| wrapped objects). All instances would share the "seen" list.
|
| It's unfortunately locked behind NDA/copyright stuff, but the
| implementation was only 67 lines.
| simon04 wrote:
| Very useful. For configparser.ConfigParser I've found
| https://stackoverflow.com/a/57307141
| golly_ned wrote:
| I have a similar use case and this idea also occurred to me.
|
| However: the dict in this case would also include dataclasses,
| and I'd be interested in finding what exact attributes within
| those dataclasses were accessed, and also be able to mark all
| attributes in those dataclasses as accessed if the parent
| dataclasses is accessed, and with those dataclasses, being config
| objects, being able to do the same to its own children, so that
| the topmost dictionary has a tree of all accessed keys.
|
| I couldn't figure out how to do that, but welcome to ideas.
| codethief wrote:
| Only tangentially related but I am really excited about PEP 7641
| (inline typed dictionaries). If it gets accepted, we can finally
| replace entire hierarchies of dataclasses with simple nested
| dictionary types and call it a day.
|
| I am currently teaching (typed) Python to a team of Windows
| sysadmins and it's been incredibly difficult to explain when to
| use a dataclass, a NamedTuple, a Pydantic model, or a dictionary.
|
| 1) https://peps.python.org/pep-0764/
| JohnKemeny wrote:
| Do you seriously have difficulties explaining when to use a
| class and when to use a dictionary?!
| codethief wrote:
| You can create dictionaries on the fly. But dataclass objects
| require defining that dataclass first. The type safety (and
| LSP support) story for accessing individual dataclass fields
| is better than for accessing dict items (sometimes even when
| they are TypedDicts), but for iterating over all fields it's
| worse. dataclasses are nominal types and can contain
| additional logic, TypedDicts are structural ones, overall
| simpler, can be more convenient and lead to looser coupling.
| Dataclasses use metaclass and decorator magic while TypedDics
| are just plain dicts. Etc.
|
| Let me make this more concrete: Those sysadmins frequently
| need to process and pass around complex (as in heavily
| nested) structured data. The data often comes in the form of
| singleton objects, i.e. they are built in single place, then
| used in another place and then thrown away (or merged into
| some other structure). In other words, any class hierarchy
| you build represents boilerplate code you'll only ever use
| once and which will be annoying to maintain as you refactor
| your code. Do you pick dataclasses or TypedDicts (or
| something else) for your map data structures?
|
| In TypeScript you would just use `const data = <heavily
| nested object> as const` and be done with it.
| quietbritishjim wrote:
| The line is seriously blurred.
| xg15 wrote:
| To be honest, that proposal sounds like it would make the
| problem even worse, by blurring the line between dicts and
| dataclasses even more.
| codethief wrote:
| How does creating anonymous TypedDicts (and allowing them to
| be nested on the fly) blur the line "even more" when those
| features are not supported by dataclasses?
|
| I mean I agree w.r.t. the blurriness in general but this PEP
| is not going to change anything about that, in neither
| direction.
| xg15 wrote:
| True, but I think what I don't like is that this PEP
| essentially creates an entire new way of "type definitions"
| that is separate from the type definitions we already have.
|
| I get the rationale for "anonymous strict" return types,
| but then I think a better way would be to think up some way
| to accomplish that for dataclasses.
| mvieira38 wrote:
| When, if ever, do you use TypedDicts?
| tiltowait wrote:
| I use them for API responses/requests where
| dataclasses/pydantic don't add much value and introduce extra
| function calls and overhead. It's most common when part of
| the response from one API gets shuttled off to another.
| There's often no value in initializing a model object, but
| it's still handy to have some form of type-checking as you
| construct the next API call.
| nurettin wrote:
| AI front: We have models to generate pictures, videos and code.
| We have the best devs and are so fskin rich!
|
| Rust front: Here's a faster ls called ls-rs with different
| defaults, you should use this!
|
| Go front: Here's reverse proxy #145728283 it is an open source
| project that has slightly different parameters than all the
| others.
|
| Python hobo front: Uhh guys here's a dict that kinda might
| remember what you've accessed if you used it in a particular way.
| mrits wrote:
| For giant dicts a bloomfilter would work great here
___________________________________________________________________
(page generated 2025-07-30 23:01 UTC)