[HN Gopher] Loading Pydantic models from JSON without running ou...
___________________________________________________________________
Loading Pydantic models from JSON without running out of memory
Author : itamarst
Score : 63 points
Date : 2025-05-22 18:06 UTC (4 hours ago)
(HTM) web link (pythonspeed.com)
(TXT) w3m dump (pythonspeed.com)
| thisguy47 wrote:
| I'd like to see a comparison of ijson vs just `json.load(f)`.
| `ujson` would also be interesting to see.
| itamarst wrote:
| For my PyCon 2025 talk I did this. Video isn't up yet, but
| slides are here: https://pythonspeed.com/pycon2025/slides/
|
| The linked-from-original-article ijson article was the
| inspiration for the talk:
| https://pythonspeed.com/articles/json-memory-streaming/
| fjasdfas wrote:
| So are there downsides to just always setting slots=True on all
| of my python data types?
| itamarst wrote:
| You can't add extra attributes that weren't part of the
| original dataclass definition: >>> from
| dataclasses import dataclass >>> @dataclass ...
| class C: pass ... >>> C().x = 1 >>>
| @dataclass(slots=True) ... class D: pass ...
| >>> D().x = 1 Traceback (most recent call last):
| File "<python-input-4>", line 1, in <module> D().x =
| 1 ^^^^^ AttributeError: 'D' object has no
| attribute 'x' and no __dict__ for setting new attributes
|
| Most of the time this is not a thing you actually need to do.
| masklinn wrote:
| Also some of the introspection stops working e.g. vars().
|
| If you're using dataclasses it's less of an issue because
| dataclasses.asdict.
| monomial wrote:
| I rarely need to dynamically add attributes myself on
| dataclasses like this but unfortunately this also means
| things like `@cached_property` won't work because it can't
| internally cache the method result anywhere.
| jmugan wrote:
| My problem isn't running out of memory; it's loading in a complex
| model where the fields are BaseModels and unions of BaseModels
| multiple levels deep. It doesn't load it all the way and leaves
| some of the deeper parts as dictionaries. I need like almost a
| parser to search the space of different loads. Anyone have any
| ideas for software that does that?
| causasui wrote:
| You probably want to use Discriminated Unions
| https://docs.pydantic.dev/latest/concepts/unions/#discrimina...
| enragedcacti wrote:
| The only reason I can think of for the behavior you are
| describing is if one of the unioned types at some level of the
| hierarchy is equivalent to Dict[str, Any]. My understanding is
| that Pydantic will explore every option provided recursively
| and raise a ValidationError if none match but will never just
| give up and hand you a partially validated object.
|
| Are you able to share a snippet that reproduces what you're
| seeing?
| cbcoutinho wrote:
| At some point, we have to admit we're asking too much from our
| tools.
|
| I know nothing about your context, but in what context would a
| single model need to support so many permutations of a data
| structure? Just because software can, doesn't mean it should.
| shakna wrote:
| Anything multi-tenant? There's a reason Salesforce is used
| for so many large organisations. The multi-nesting lets you
| account for all the descrepancies that come with scale.
|
| Just tracking payments through multiple tax regions will
| explode the places where things need to be tweaked.
| not_skynet wrote:
| going to shamelessly plug my own library here:
| https://github.com/mivanit/ZANJ
|
| You can have nested dataclasses, as well as specify custom
| serializers/loaders for things which aren't natively supported
| by json.
| m_ke wrote:
| Or just dump pydantic and use msgspec instead:
| https://jcristharif.com/msgspec/
| itamarst wrote:
| msgspec is much more memory efficient out of the box, yes. Also
| quite fast.
| mbb70 wrote:
| A great feature of pydantic are the validation hooks that let
| you intercept serialization/deserialization of specific fields
| and augment behavior.
|
| For example if you are querying a DB that returns a column as a
| JSON string, trivial with Pydantic to json parse the column are
| part of deser with an annotation.
|
| Pydantic is definitely slower and not a 'zero cost
| abstraction', but you do get a lot for it.
| zxilly wrote:
| Maybe using mmap would also save some memory, I'm not quite sure
| if this can be implemented in Python.
| itamarst wrote:
| Once you switch to ijson it will not save any memory, no,
| because ijson essentially uses zero memory for the parsing.
| You're just left with the in-memory representation.
| dgan wrote:
| i gave up on python dataclasses & json. Using protobufs object
| within the application itself. I also have a "...Mixin" class for
| almost every wire model, with extra methods
|
| Automatic, statically typed deserialization is worth the trouble
| in my opinion
___________________________________________________________________
(page generated 2025-05-22 23:00 UTC)