[HN Gopher] A Python dict that can report which keys you did not...
       ___________________________________________________________________
        
       A Python dict that can report which keys you did not use
        
       Author : gilad
       Score  : 71 points
       Date   : 2025-07-27 02:22 UTC (3 days ago)
        
 (HTM) web link (www.peterbe.com)
 (TXT) w3m dump (www.peterbe.com)
        
       | jraph wrote:
       | I did exactly the same thing in our Confluence to XWiki migrator
       | to easily and automatically report which macro parameters we
       | don't handle when converting Confluence macros to equivalent
       | macros in XWiki.
       | 
       | This can be used to evaluate the migration quality and spot what
       | can be improved.
       | 
       | https://github.com/xwiki-contrib/confluence/blob/7a95bf96787...
        
       | IshKebab wrote:
       | I think if you feel like you need this then it's a bit of a red
       | flag and you should be using Pydantic or `dataclass` instead,
       | then your IDE can statically tell you which fields you don't
       | access (among many other benefits). Dicts are mainly for when you
       | don't know the keys up front.
        
         | mb7733 wrote:
         | Static analysis could only tell you which fields are never
         | used, across all usage of the class. Not on a given instance.
        
         | taeric wrote:
         | Counterpoint, something like this for dataclasses would also be
         | very useful.
         | 
         | That is, it isn't just knowing whether or not the data is ever
         | used. It is useful to know if it was used in this specific run.
         | And often times, seeing what parts of the data was not used is
         | a good clue as to what went wrong. At the least, you can use it
         | to rule out what code was not hit.
        
       | ok123456 wrote:
       | If you're inheriting from dict to extend its behavior, there are
       | a lot of side effects with that, and it's recommended to use
       | https://docs.python.org/3/library/collections.html#collectio...
       | instead.
        
         | quietbritishjim wrote:
         | From right above where you linked to:
         | 
         | > The need for this class has been partially supplanted by the
         | ability to subclass directly from dict; however, this class can
         | be easier to work with because the underlying dictionary is
         | accessible as an attribute.
         | 
         | Sounds like (unless you need the dict as a separate data
         | member) this class is a historical artefact. Unless there's
         | some other issue you know of not mentioned in the
         | documentation?
        
           | ok123456 wrote:
           | dict doesn't follow the usual object protocol, and overloaded
           | methods are runtime dependent. It's only guaranteed that non-
           | overloaded methods are resolved least surprisingly.
        
             | quietbritishjim wrote:
             | I think you mean overridden (i.e. defined in both base
             | class and derived class) rather than overloaded (i.e.
             | defined more than once in a single place but with different
             | argument types, as least from a typing point of view [1]).
             | Your comment seriously confused me till I figured that out.
             | 
             | [1] https://typing.python.org/en/latest/spec/overload.html
             | 
             | Even then, to be honest I'm a bit sceptical. Can you point
             | at a link in the official documentation that says
             | overriding methods of dictionaries may not work? I would
             | have thought the link to UserDict would have mentioned that
             | if true. What do you mean they are "runtime dependent"?
        
         | mont_tag wrote:
         | No, that is _not_ the recommendation. People routinely and
         | reliably inherit from dict.
         | 
         | The UserDict class is mostly defunct and is only still in the
         | standard library because there were a few existing uses that
         | were hard to replace (such as avoiding base class conflicts in
         | multiple inheritance).
        
           | smcin wrote:
           | UserDict is not formally deprecated but it will be someday,
           | so code that relies on it is not future-proof.
        
           | 9dev wrote:
           | Ah, Python. The language where nobody agrees on the right way
           | to do things, ans just does their own instead. Five ways to
           | describe an object of a certain shape? Six package managers,
           | with incompatible but overlapping ways to publish packages,
           | but half of them without a simple way to update dependencies?
           | Asynchronous versions of everything? Metaprogramming that
           | makes Ruby blush? Yes! All of it! Lovely.
        
       | boothby wrote:
       | Just a heads up, this fails to track usage of _get_ and
       | _setdefault_. The ability to iterate over dicts makes the whole
       | question rather murky.
        
         | quietbritishjim wrote:
         | I didn't know about the setdefault method, and wouldn't have
         | guessed it lets you read a value. Interesting, thanks.
         | 
         | Another way to get data out would be to use the new | operator
         | (i.e. x = {} | y essentially copies dictionary x to y) or the
         | update method or ** unpacking operator (e.g. x = {**y}). But
         | maybe those come under the umbrella of iterating as you
         | mentioned.
        
           | notatallshaw wrote:
           | setdefault was a go to method before defaultdict was added to
           | the collections module in Python 2.5, which replaced the
           | biggest use case.
        
             | boothby wrote:
             | It's been some time since I last benchmarked defaultdict
             | but last time I did (circa 3.6 and less?), it was
             | considerably slower than judicious use of setdefault.
        
               | quietbritishjim wrote:
               | One time that defaultdict may come out ahead is if the
               | default value is expensive to construct and rarely
               | needed:                   d.setdefault(k, computevalue())
               | 
               | defaultdict takes a factory function, so it's only called
               | if the key is not already present:                   d =
               | defaultdict(computevalue)
               | 
               | This applies to some extent even if the default value is
               | just an empty dictionary (as it often is in my
               | experience). You can use dict() as the factory function
               | in that case.
               | 
               | But I have never benchmarked!
        
               | masklinn wrote:
               | > if the default value is expensive to construct and
               | rarely needed:
               | 
               | I'd say "or" rather than "and": defaultdict has higher
               | overhead to initialise the default (especially if you
               | don't need a function call in the setdefault call) but
               | because it uses a fallback of dict lookup it's
               | essentially free if you get a hit. As a result, either a
               | very high redundancy with a cheap default or a low amount
               | of redundancy with a costly default will have the
               | defaultdict edge out.
               | 
               | For the most extreme case of the former,
               | d = {}         for i in range(N):
               | d.setdefault(0, [])
               | 
               | versus                   d = defaultdict(list)
               | for i in range(N):             d[0]
               | 
               | has the defaultdict edge out at N=11 on my machine (561ns
               | for setdefault versus 545 for defaultdict). And that's
               | with a literal list being quite a bit cheaper than a
               | list() call.
        
         | hackish wrote:
         | Along with those and iteration, it also would need to handle
         | del/pop/popitem/update/copy/or/ror/... some of which might
         | necessitate a decision on whether comparisons/repr also count
         | as access.
        
         | rjmill wrote:
         | Indeed. Inheriting from 'collections.UserDict' instead of
         | 'dict' will make TFA's code work as intended for most of those
         | edge cases.
         | 
         | UserDict will route '.get', '.setdefault', and even iteration
         | via '.items()' through the '__getitem__' method.
         | 
         |  _edited to remove "(maybe all?) edge cases". As soon as I
         | posted, I thought of several less common/obvious edge cases._
        
       | jgalt212 wrote:
       | why not inside of __init__                 self.accessed_keys =
       | set()
       | 
       | instead of                   @property         def
       | accessed_keys(self):             return self._accessed_keys
        
         | Jaxan wrote:
         | With the @property you only get the "getter" and not the
         | "setter".
        
           | eurleif wrote:
           | But that doesn't accomplish much, because you can still do:
           | `d.accessed_keys.add('foo')`.
        
       | larrik wrote:
       | I actually wrote something similar in nodejs for a data import
       | system. Was very handy.
        
         | null_deref wrote:
         | Interesting! Can you elaborate a little bit more on your
         | implementation?
        
           | larrik wrote:
           | Mine was a bit more specific. I had a JSON object of data
           | exported per account I was importing, and then a complex
           | mapping (also JSON) of where to put each piece of data.
           | 
           | Therefore, I really wanted to know that I was actually
           | pulling in all of the data I needed, so I tracked what was
           | seen vs not seen, and compared against what was attempted to
           | see.
           | 
           | In the end it was basically a wrapper around the JSON object
           | itself, that allowed lookup of data via a string in "dot
           | notation" (so you could do "keyA.key2" to get the same thing
           | you would have directly in JSON. Then, it would either return
           | a simple value (if there was one), or another instance of the
           | wrapper if the result was itself an object (or an array or
           | wrapped objects). All instances would share the "seen" list.
           | 
           | It's unfortunately locked behind NDA/copyright stuff, but the
           | implementation was only 67 lines.
        
       | simon04 wrote:
       | Very useful. For configparser.ConfigParser I've found
       | https://stackoverflow.com/a/57307141
        
       | golly_ned wrote:
       | I have a similar use case and this idea also occurred to me.
       | 
       | However: the dict in this case would also include dataclasses,
       | and I'd be interested in finding what exact attributes within
       | those dataclasses were accessed, and also be able to mark all
       | attributes in those dataclasses as accessed if the parent
       | dataclasses is accessed, and with those dataclasses, being config
       | objects, being able to do the same to its own children, so that
       | the topmost dictionary has a tree of all accessed keys.
       | 
       | I couldn't figure out how to do that, but welcome to ideas.
        
       | codethief wrote:
       | Only tangentially related but I am really excited about PEP 7641
       | (inline typed dictionaries). If it gets accepted, we can finally
       | replace entire hierarchies of dataclasses with simple nested
       | dictionary types and call it a day.
       | 
       | I am currently teaching (typed) Python to a team of Windows
       | sysadmins and it's been incredibly difficult to explain when to
       | use a dataclass, a NamedTuple, a Pydantic model, or a dictionary.
       | 
       | 1) https://peps.python.org/pep-0764/
        
         | JohnKemeny wrote:
         | Do you seriously have difficulties explaining when to use a
         | class and when to use a dictionary?!
        
           | codethief wrote:
           | You can create dictionaries on the fly. But dataclass objects
           | require defining that dataclass first. The type safety (and
           | LSP support) story for accessing individual dataclass fields
           | is better than for accessing dict items (sometimes even when
           | they are TypedDicts), but for iterating over all fields it's
           | worse. dataclasses are nominal types and can contain
           | additional logic, TypedDicts are structural ones, overall
           | simpler, can be more convenient and lead to looser coupling.
           | Dataclasses use metaclass and decorator magic while TypedDics
           | are just plain dicts. Etc.
           | 
           | Let me make this more concrete: Those sysadmins frequently
           | need to process and pass around complex (as in heavily
           | nested) structured data. The data often comes in the form of
           | singleton objects, i.e. they are built in single place, then
           | used in another place and then thrown away (or merged into
           | some other structure). In other words, any class hierarchy
           | you build represents boilerplate code you'll only ever use
           | once and which will be annoying to maintain as you refactor
           | your code. Do you pick dataclasses or TypedDicts (or
           | something else) for your map data structures?
           | 
           | In TypeScript you would just use `const data = <heavily
           | nested object> as const` and be done with it.
        
           | quietbritishjim wrote:
           | The line is seriously blurred.
        
         | xg15 wrote:
         | To be honest, that proposal sounds like it would make the
         | problem even worse, by blurring the line between dicts and
         | dataclasses even more.
        
           | codethief wrote:
           | How does creating anonymous TypedDicts (and allowing them to
           | be nested on the fly) blur the line "even more" when those
           | features are not supported by dataclasses?
           | 
           | I mean I agree w.r.t. the blurriness in general but this PEP
           | is not going to change anything about that, in neither
           | direction.
        
             | xg15 wrote:
             | True, but I think what I don't like is that this PEP
             | essentially creates an entire new way of "type definitions"
             | that is separate from the type definitions we already have.
             | 
             | I get the rationale for "anonymous strict" return types,
             | but then I think a better way would be to think up some way
             | to accomplish that for dataclasses.
        
         | mvieira38 wrote:
         | When, if ever, do you use TypedDicts?
        
           | tiltowait wrote:
           | I use them for API responses/requests where
           | dataclasses/pydantic don't add much value and introduce extra
           | function calls and overhead. It's most common when part of
           | the response from one API gets shuttled off to another.
           | There's often no value in initializing a model object, but
           | it's still handy to have some form of type-checking as you
           | construct the next API call.
        
       | nurettin wrote:
       | AI front: We have models to generate pictures, videos and code.
       | We have the best devs and are so fskin rich!
       | 
       | Rust front: Here's a faster ls called ls-rs with different
       | defaults, you should use this!
       | 
       | Go front: Here's reverse proxy #145728283 it is an open source
       | project that has slightly different parameters than all the
       | others.
       | 
       | Python hobo front: Uhh guys here's a dict that kinda might
       | remember what you've accessed if you used it in a particular way.
        
       | mrits wrote:
       | For giant dicts a bloomfilter would work great here
        
       ___________________________________________________________________
       (page generated 2025-07-30 23:01 UTC)