[HN Gopher] Loading Pydantic models from JSON without running ou...
       ___________________________________________________________________
        
       Loading Pydantic models from JSON without running out of memory
        
       Author : itamarst
       Score  : 63 points
       Date   : 2025-05-22 18:06 UTC (4 hours ago)
        
 (HTM) web link (pythonspeed.com)
 (TXT) w3m dump (pythonspeed.com)
        
       | thisguy47 wrote:
       | I'd like to see a comparison of ijson vs just `json.load(f)`.
       | `ujson` would also be interesting to see.
        
         | itamarst wrote:
         | For my PyCon 2025 talk I did this. Video isn't up yet, but
         | slides are here: https://pythonspeed.com/pycon2025/slides/
         | 
         | The linked-from-original-article ijson article was the
         | inspiration for the talk:
         | https://pythonspeed.com/articles/json-memory-streaming/
        
       | fjasdfas wrote:
       | So are there downsides to just always setting slots=True on all
       | of my python data types?
        
         | itamarst wrote:
         | You can't add extra attributes that weren't part of the
         | original dataclass definition:                 >>> from
         | dataclasses import dataclass       >>> @dataclass       ...
         | class C: pass       ...        >>> C().x = 1       >>>
         | @dataclass(slots=True)       ... class D: pass       ...
         | >>> D().x = 1       Traceback (most recent call last):
         | File "<python-input-4>", line 1, in <module>           D().x =
         | 1           ^^^^^       AttributeError: 'D' object has no
         | attribute 'x' and no __dict__ for setting new attributes
         | 
         | Most of the time this is not a thing you actually need to do.
        
           | masklinn wrote:
           | Also some of the introspection stops working e.g. vars().
           | 
           | If you're using dataclasses it's less of an issue because
           | dataclasses.asdict.
        
           | monomial wrote:
           | I rarely need to dynamically add attributes myself on
           | dataclasses like this but unfortunately this also means
           | things like `@cached_property` won't work because it can't
           | internally cache the method result anywhere.
        
       | jmugan wrote:
       | My problem isn't running out of memory; it's loading in a complex
       | model where the fields are BaseModels and unions of BaseModels
       | multiple levels deep. It doesn't load it all the way and leaves
       | some of the deeper parts as dictionaries. I need like almost a
       | parser to search the space of different loads. Anyone have any
       | ideas for software that does that?
        
         | causasui wrote:
         | You probably want to use Discriminated Unions
         | https://docs.pydantic.dev/latest/concepts/unions/#discrimina...
        
         | enragedcacti wrote:
         | The only reason I can think of for the behavior you are
         | describing is if one of the unioned types at some level of the
         | hierarchy is equivalent to Dict[str, Any]. My understanding is
         | that Pydantic will explore every option provided recursively
         | and raise a ValidationError if none match but will never just
         | give up and hand you a partially validated object.
         | 
         | Are you able to share a snippet that reproduces what you're
         | seeing?
        
         | cbcoutinho wrote:
         | At some point, we have to admit we're asking too much from our
         | tools.
         | 
         | I know nothing about your context, but in what context would a
         | single model need to support so many permutations of a data
         | structure? Just because software can, doesn't mean it should.
        
           | shakna wrote:
           | Anything multi-tenant? There's a reason Salesforce is used
           | for so many large organisations. The multi-nesting lets you
           | account for all the descrepancies that come with scale.
           | 
           | Just tracking payments through multiple tax regions will
           | explode the places where things need to be tweaked.
        
         | not_skynet wrote:
         | going to shamelessly plug my own library here:
         | https://github.com/mivanit/ZANJ
         | 
         | You can have nested dataclasses, as well as specify custom
         | serializers/loaders for things which aren't natively supported
         | by json.
        
       | m_ke wrote:
       | Or just dump pydantic and use msgspec instead:
       | https://jcristharif.com/msgspec/
        
         | itamarst wrote:
         | msgspec is much more memory efficient out of the box, yes. Also
         | quite fast.
        
         | mbb70 wrote:
         | A great feature of pydantic are the validation hooks that let
         | you intercept serialization/deserialization of specific fields
         | and augment behavior.
         | 
         | For example if you are querying a DB that returns a column as a
         | JSON string, trivial with Pydantic to json parse the column are
         | part of deser with an annotation.
         | 
         | Pydantic is definitely slower and not a 'zero cost
         | abstraction', but you do get a lot for it.
        
       | zxilly wrote:
       | Maybe using mmap would also save some memory, I'm not quite sure
       | if this can be implemented in Python.
        
         | itamarst wrote:
         | Once you switch to ijson it will not save any memory, no,
         | because ijson essentially uses zero memory for the parsing.
         | You're just left with the in-memory representation.
        
       | dgan wrote:
       | i gave up on python dataclasses & json. Using protobufs object
       | within the application itself. I also have a "...Mixin" class for
       | almost every wire model, with extra methods
       | 
       | Automatic, statically typed deserialization is worth the trouble
       | in my opinion
        
       ___________________________________________________________________
       (page generated 2025-05-22 23:00 UTC)