[HN Gopher] Frankenstein's `__init__`
___________________________________________________________________
Frankenstein's `__init__`
Author : todsacerdoti
Score : 82 points
Date : 2025-04-19 11:32 UTC (11 hours ago)
(HTM) web link (ohadravid.github.io)
(TXT) w3m dump (ohadravid.github.io)
| smitty1e wrote:
| To paraphrase Wheeler/Lampson: "All problems in python can be
| solved by another level of indiscretion."
| jerf wrote:
| Oh, that's good. That has some depth to it. I'm going to have
| to remember that one. Of course one can switch out "Python" for
| whatever the local situation is too.
| _Algernon_ wrote:
| fix: add time.sleep(0.1) in the .close method.
| gnfargbl wrote:
| # NOTE: doubled this to 0.2s because of some weird test
| failures.
| Const-me wrote:
| Using sleep instead of proper thread synchronization is
| unreliable. Some people have slow computers like a cheap tablet
| or free tier cloud VM. Other people have insufficient memory
| for the software they run and their OS swaps.
|
| When I need something like that, in my destructor I usually ask
| worker thread to shut down voluntarily (using a special posted
| message, manual reset event, or an atomic flag), then wait for
| the worker thread to quit within a reasonable timeout. If it
| didn't quit within the timeout, I log a warning message if I
| can.
| btown wrote:
| Notably on this point, resource contention on e.g. a
| Kubernetes cluster can cause these kinds of things to fail
| spuriously. Sleeps have their uses, but assuming that work is
| done while you're sleeping is rarely a good idea, because
| when it rains it will pour.
| crdrost wrote:
| I believe _Algernon_ was making a joke about how you could
| take this problem and make the code even worse, not proposing
| a solution to the problem.
|
| It's also not even a problem with slow computers or
| insufficient memory, __init__ does I/O here, it connects to
| ZeroMQ, so it could have arbitrary latency in various
| circumstances exceeding the 100 milliseconds that we would be
| sleeping for. So the joke is, this fixes the problem in your
| test environment where everything is squeaky clean and you
| know that ZeroMQ is reachable, and now you have bugs in prod
| still.
| zokier wrote:
| Pretty simple to fix by changing the _init to something like:
| def _init(self): init_complete = threading.Event()
| def worker_thread_start():
| FooWidget.__init__(self) init_complete.set()
| self.run() worker_thread =
| Thread(target=worker_thread_start, daemon=True)
| worker_thread.start() init_complete.wait()
|
| Spawning worker thread from constructor is not that crazy, but
| you want to make sure the constructing is completed by the time
| you return from the constructor.
| ohr wrote:
| Thanks, I hate it! Jk, but would you consider this a good
| solution overall?
| im3w1l wrote:
| Not him, but I would also consider making FooBarWidget _have_
| a FooWidget rather than _be_ a FooWidget.
|
| Normally with a FooWidget you can create one on some thread
| and then perform operations on it on that thread. But in the
| case of the FooBarWidget you can not do operations on it
| because operations must be done in the special thread that is
| inaccessible.
| shiandow wrote:
| I can't come up with a single good reason why you would wish
| to use inheritance in this case.
|
| Except possibly for type checking, but
|
| 1. Python has duck typing anyway
|
| 2. It's debatable whether these two classes should be
| interchangeable
|
| 3. You shouldn't need to use inheritance just to make the
| type checking work.
|
| It could be considered a clever hack in some situations, but
| it's completely unsurprising that it causes issues. Putting
| band-aids on it after you find a bug does not fix the real
| problem.
| gpderetta wrote:
| I would consider this an hack. But a reasonable hack if you
| can't change the base class.
| greatgib wrote:
| To be clear, a good init is not supposed to create a thread or
| do any execution that might not be instant of things like that.
|
| It would have been better to an addition start, run, exec
| method that does that kind of things. Even if for usage it is
| an inch more complicated to use with an additional call.
| lmm wrote:
| > It would have been better to an addition start, run, exec
| method that does that kind of things. Even if for usage it is
| an inch more complicated to use with an additional call.
|
| That breaks RAII - you shouldn't give the user a way to
| create the object in an invalid state. Although if it's
| intended to be used as a context manager anyway then maybe
| doing it on context enter rather than on init would be nicer.
| greatgib wrote:
| Doesn't necessarily have to be in an invalid state. You can
| have your object with a none value for the socket or a
| state like "not_started" and the different method will look
| at that to handle everything properly.
|
| Also, it was not supposed to be used in a context manager
| as there is just a close method but it is the caller that
| decided to wrap it in a context.
|
| But as what you suggest, indeed it would have been a great
| idea to add an __enter__ and do the starting there are as
| it would be the proper way.
| crdrost wrote:
| I don't think it's totally crazy to, say, open a DB
| connection in __init__() even though that's not an
| instantaneous operation. That's not a hill I would die on,
| you can just say "open a connection and hand it to me as an
| argument," but it just looks a little cleaner to me if the
| lifecycle of the connection is being handled inside this DB-
| operations class. (You can also defer creating the connection
| until it is actually used, or require an explicit call to
| connect it, but then you are also writing a bunch of
| boilerplate to make the type checker happy for cases where
| the class is having its methods called before it was properly
| connected.)
| mjr00 wrote:
| It's not totally crazy in that I see it all the time, but
| it's one of the two most common things I've found make
| Python code difficult to reason about.[0] After all, if you
| open a DB connection in __init__() -- how do you close it?
| This isn't C++ where we can tie that to a destructor. I've
| run into _so_ many Python codebases that do this and have
| tons of unclosed connections as a result.
|
| A much cleaner way (IMO) to do this is use context managers
| that have explicit lifecycles, so something like this:
| @contextmanager def create_db_client(host: str,
| port: int) -> Generator[_DbClient, None, None]:
| try: connection = mydblib.connect(host,
| port) client = _DbClient(connection)
| yield client finally:
| connection.close() class _DbClient:
| def __init__(self, connection):
| self._connection = connection def
| do_thing_that_requires_connection(...): ...
|
| Which lets you write client code that looks like
| with create_db_client('localhost', 5432) as db_client: #
| port 3306 if you're a degenerate
| db_client.do_thing_that_requires_connection(...)
|
| This gives you type safety, connection safety, has minimal
| boilerplate for client code, _and_ ensures the connection
| is created and disposed of properly. Obviously in larger
| codebases there 's some more nuances, and you might want to
| implement a `typing.Protocol` for `_DbClient` that lets you
| pass it around, but IMO the general idea is much better
| than initializing a connection to a DB, ZeroMQ socket, gRPC
| client, etc in __init__.
|
| [0] The second is performing "heavy", potentially failing
| operations outside of functions and classes, which can
| cause failures when importing modules.
| lijok wrote:
| > This isn't C++ where we can tie that to a destructor
|
| `def __del__`
| mjr00 wrote:
| C++ destructors are deterministic. Relying on a
| nondeterministic GC call to run __del__ is _not_ good
| code.
|
| Also worth noting that the Python spec does _not_ say
| __del__ must be called, only that it _may_ be called
| after all references are deleted. So, no, you can 't tie
| it to __del__.
| 12_throw_away wrote:
| > def __del__
|
| Nope, do not ever do this, it will not do what you want.
| You have no idea _when_ it will be called. It can get
| called at shutdown where the entire runtime environment
| is in the process of being torn down, meaning that
| nothing actually works anymore.
| tayo42 wrote:
| I get the points everyone is making and they make sense,
| but sometimes you need persistent connections. Open and
| closing constantly like can cause issues
| mjr00 wrote:
| There's nothing about the contextmanager approach that
| says you're open and closing any more or less frequently
| than a __init__ approach with a separate `close()`
| method. You're just statically ensuring 1) the close
| method gets called, and 2) database operations can only
| happen on an open connection. (or, notably, a connection
| that we expect to be open, as something external the
| system may have closed it in the meantime.)
|
| Besides, persistent connections are a bit orthogonal
| since you should be using a connection pool in practice,
| which most Python DB libs provide out of the box. In
| either case, the semantics are the same, open/close
| becomes lease from/return to pool.
| eptcyka wrote:
| What if the object is in an invalid state unless the thread
| is started?
| greatgib wrote:
| Nothing prevents you to have a valid "not_yet_started"
| state.
| rcxdude wrote:
| No, but it does complicate things substantially. I don't
| understand why you would have a useless half-constructed
| state, especially in python where init isn't all that
| magical.
| lgas wrote:
| The benefits of making illegal states unrepresentable are
| just not widely understood outside of pure FP circles. I
| think it's hard for stuff like this to catch on in python
| in particular because everything is so accessible and
| mutable, it would be hard to talk about anything that
| provides the level of confidence that eg. ADTs in Haskell
| provide. And it's hard to understand why you would want
| to try to do something like that in a context like python
| unless you already have experience with it in a
| statically typed context.
| eptcyka wrote:
| And what would you do with this object after
| construction? Call start() on it immediately?
| greatgib wrote:
| Kind of yes or no, depend on your usage. But clearly
| would have worked well for the case of the automatic test
| if the start was the context
| jhji7i77l wrote:
| > To be clear, a good init is not supposed to create a thread
| or do any execution that might not be instant of things like
| that.
|
| Maybe it indirectly ensures that only one thread is created
| per instantiation; there are better ways of achieving that
| though.
| wodenokoto wrote:
| > solving an annoying problem with a complete and utter disregard
| to sanity, common sense, and the feelings of other human beings.
|
| While I enjoy comments like these (and the article overall!),
| they stand stronger if followed by an example of a solution that
| regards sanity, common sense and the feelings of others.
| crdrost wrote:
| So in this case that would be, a FooBarWidget is not a subclass
| of FooWidget but maybe still a subclass of AbstractWidget above
| that. It contains a thread and config as its state variables.
| That thread instantiates a FooWidget with the saved config, and
| runs it, and finally closes its queue.
|
| The problem still occurs, because you have to define what it
| means to close a FooBarWidget and I don't think python Thread
| has a "throw an exception into this thread" method. Just
| setting the should_exit property, has the same problem as the
| post! The thread might still be initing the object and any
| attempt to tweak across threads could sometimes tweak before
| init is complete because init does I/O. But once you are there,
| this is just a tweak of the FooWidget code. FooWidget could
| respond to a lock, a semaphore, a queue of requests, any number
| of things, to be told to shut down.
|
| In fact, Python has a nice built-in module called asyncio,
| which implements tasks, and tasks can be canceled and other
| such things like that, probably you just wanted to move the
| foowidget code into a task. (See @jackdied's great talk "Stop
| Writing Classes" for more on this general approach. It's not
| about asyncio specifically, rather it's just about how the
| moment we start throwing classes into our code, we start to
| think about things that are not just solving the problem in
| front of us, and solving the problem in front of us could be
| done with just simple structures from the standard library.)
| mjr00 wrote:
| Doing anything like this in __init__ is crazy. Even
| `Path("config.json").read_text()` in a constructor isn't a good
| idea.
|
| Friends don't let friends build complicated constructors that can
| fail; this is a huge violation of the Principle of Least
| Astonishment. If you require external resources like a zeromq
| socket, use connect/open/close/etc methods (and a contextmanager,
| probably). If you need configuration, create a separate function
| that parses the configuration, then returns the object.
|
| I appreciate the author's circumstances may have not allowed them
| to make these changes, but it'd drive me nuts leaving this as-is.
| echelon wrote:
| Not just Python. Most languages with constructors behave badly
| if setup fails: C++ (especially), Java, JavaScript. Complicated
| constructors are a nuisance and a danger.
|
| Rust is the language I'm familiar with that does this
| exceptionally well (although I'm sure there are others). It's
| strictly because there are no constructors. Constructors are
| not special language constructs, and any method can function in
| that way. So you pay attention to the function signature just
| like everywhere else: return Result<Self, T> explicitly, heed
| async/await, etc. A constructor is no different than a static
| helper method in typical other languages.
|
| new Thing() with fragility is vastly inferior to new_thing() or
| Thing::static_method_constructor() without the submarine
| defects.
|
| Enforced tree-style inheritance is also weird after
| experiencing a traits-based OO system where types don't have to
| fit onto a tree. You're free to pick behaviors at will. Multi-
| inheritance was a hack that wanted desperately to deliver what
| traits do, but it just made things more complicated and tightly
| coupled. I think that's what people hate most about "OOP", not
| the fact that data and methods are coupled. It's the weird
| shoehorning onto this inexplicable hierarchy requirement.
|
| I hope more languages in the future abandon constructors and
| strict tree-style and/or multi-inheritance. It's something
| existing languages could bolt on as well. Loose coupling with
| the same behavior as ordinary functions is so much easier to
| reason about. These things are so dated now and are best left
| in the 60's and 70's from whence they came.
| fouronnes3 wrote:
| How would you do traits in Python?
| echelon wrote:
| I'd really need to think long and hard about it, but my
| initial feeling is that we'd attach them to data classes or
| a similar new construct. I don't think you'd want to reason
| about the blast radius with ordinary classes. Granted,
| that's more language complexity, creates two unequal
| systems, and makes much more to reason about. There's a lot
| to juggle.
|
| As much fun as putting a PEP together might be, I don't
| think I have the bandwidth to do so. I would really like to
| see traits in Python, though.
| svilen_dobrev wrote:
| there were "traits" in python :/ . Search for pyprotocols.
|
| (searching myself.. not much left of it)
|
| 2004 - https://simonwillison.net/2004/Mar/23/pyprotocols/
|
| some related rejected PEP ..
| https://peps.python.org/pep-0246/ talking about something
| new to be expected.. in 2001?2005?
|
| no idea did it happen and which one would that be..
| dragonwriter wrote:
| Possibly Guido, in that rejection, was talking about what
| ended up as ABCs, original PEP:
| https://peps.python.org/pep-3119/
|
| Further work was done in this area, building on ABCs,
| with Protocols, original PEP:
| https://peps.python.org/pep-0544/
| vaylian wrote:
| Python has a trait-like system. It's called "abstract base
| classes": https://docs.python.org/3/library/abc.html
|
| The abstract base class should use the @abstractmethod
| decorator for methods that the implementing class needs to
| implement itself.
|
| Obviously, you can also use abstract base classes in other
| ways, but they can be used as a way to define
| traits/interfaces for classes.
| dragonwriter wrote:
| Protocols, too (they are defined on top of ABCs)
|
| https://typing.python.org/en/latest/spec/protocol.html#pr
| oto...
| gpderetta wrote:
| Protocols are definitely the closest thing to traits in
| Python. ABC are pretty much OoO interfaces.
| rcxdude wrote:
| The annoying thing is you can actually just use the rust
| solution in C++ as well (at least for initial construction),
| but basically no-one does.
| whatevaa wrote:
| In C#, it is also not a good idea to have constructors with
| failable io in them.
| neonsunset wrote:
| Yup, and IO being async usually creates impedance mismatch
| in constructors.
|
| Had to refactor quite a few anti-pattern constructors like
| this into `async Task<Thing> Create(...)` back in the day.
| No idea what was the thought process of the original
| authors, if there was any...
| hdjrudni wrote:
| I don't have enough experience with traits, but they also
| sound like a recipe for creating a mess. I find anything more
| than like 1 level of inheritance starts to create trouble.
| But perhaps that's the magic of traits? Instead of creating
| deep stacks, you mix-and-match all your traits on your leaf
| class?
| echelon wrote:
| > I find anything more than like 1 level of inheritance
| starts to create trouble.
|
| That's the beauty of traits (or "type classes"). They're
| _behaviors_ and they don 't require thinking in terms of
| inheritance. Think interfaces instead.
|
| If you want your structure or object to print debug
| information when logged to the console, you custom
| implement or auto-derive a "Debug" trait.
|
| If you want your structure or object to implement copies,
| you write your own or auto-derive a "Clone" trait. You can
| control whether they're shallow or deep if you want.
|
| If you want your structure or object to be convertible to
| JSON or TOML or some other format, you implement or auto-
| derive "Serialize". Or to get the opposite behavior of
| hydrating from strings, the "Deserialize" trait.
|
| If you're building a custom GUI application, and you want
| your widget to be a button, you implement or auto-derive
| something perhaps called "Button". You don't have to
| shoehorn this into some kind of "GObject > Widget > Button"
| kind of craziness.
|
| You can take just what you need and keep everything flat.
|
| Here's a graphical argument I've laid out:
| https://imgur.com/a/bmdUV3H
| rcxdude wrote:
| Why? An object encapsulates some state. If it doesn't do
| anything unless ypu call some other method on it first, it
| should just happen in the constructor. Otherwise you've got one
| object that's actually two types: the object half-initialised
| and the object fully initialised, and it's very easy to confuse
| the two. Especially in python there's basically no situation
| where you're going to need that half-state for some language
| restriction.
| mjr00 wrote:
| It's a _lot_ easier to reason about code if I don 't need to
| worry that something as simple as
| my_variable = MyType()
|
| might be calling out to a database with invalid credentials,
| establishing a network connection which may fail to connect,
| or reading a file from the filesystem which might not exist.
|
| You are correct that you don't want an object that can be
| half-initialized. In that case, any external resources
| necessary should be allocated _before_ the constructor is
| called. This is particularly true if the external resources
| must be closed after use. You can see my other comment[0] in
| this thread for a full example, but the short answer is use
| context managers; this is the Pythonic way to do RAII.
|
| [0] https://news.ycombinator.com/item?id=43736940
| lyu07282 wrote:
| > might be calling out to a database with invalid
| credentials, establishing a network connection which may
| fail to connect, or reading a file from the filesystem
| which might not exist.
|
| In python its not unusual that, import
| some_third_partylib
|
| will do exactly that. I've seen libraries that will load a
| half gigabyte machine learning model into memory on import
| and one that sends some event to sentry for telemetry.
|
| People write such shit code, its unbelievable.
| banthar wrote:
| Python context managers somewhat require it. You have to
| create an object on which you can call `__enter__`.
| ptsneves wrote:
| It is a tragedy that python almost got some form or RAII
| but then figured out an object has 2 stages of usage.
|
| I also strongly disagree constructors cannot fail. An
| object that is not usable should fail fast and stop code
| flow the earliest possible. Fail early is a good thing.
| jakewins wrote:
| There is no contradiction between "constructors cannot
| fail" and "fail early", nobody is arguing the constructor
| should do fallible things and then hide the failure.
|
| What you should do is the fallible operation _outside_
| the constructor, _before_ you call __init__, then ask for
| the opened file, socket, lock, what-have-you as an
| argument to the constructor.
|
| Fallible initialisation operations belong in factory
| functions.
| jakewins wrote:
| This is the exact opposite? They explicitly encourage doing
| resource-opening in the __enter__ method call, and then
| returning the opened resource encapsulated inside an
| object.
|
| Nothing about the contract encourages doing anything
| fallible in __init__
| hdjrudni wrote:
| > Otherwise you've got one object that's actually two types:
| the object half-initialised and the object fully initialised,
| and it's very easy to confuse the two.
|
| You said it yourself -- if you feel like you have two
| objects, then literally use two objects if you need to split
| it up that way, FooBarBuilder and FooBar. That way FooBar is
| always fully built, and if FooBarBuilder needs to do funky
| black magic, it can do so in `build()` instead of `__init__`.
| exe34 wrote:
| no thanks, that's how we end up with FactoryFactory. If the
| work needs to be done upon startup, then it needs to be
| done. if it is done in response to a later event, then it
| been done later.
| duncanfwalker wrote:
| For me the point is that __init__ is special - it's the
| language's the way to construct an object. If we want to
| side-effecting code we can with a non-special static
| constructor like
|
| class Foo @classmethod def connect(): return
| Foo()._connect()
|
| The benefit is that we can choose when we're doing 1)
| object construction, 2) side-effecting 3) both. The
| downside is client might try use the __init__() so object
| creation might need to be documented more than it would
| otherwise
| jhji7i77l wrote:
| > Even `Path("config.json").read_text()` in a constructor isn't
| a good idea.
|
| If that call is necessary to ensure that the instance has a
| good/known internal state, I absolutely think it belongs in
| __init__ (whether called directly or indirectly via a method).
| mjr00 wrote:
| You're right that consistent internal state is important, but
| you can accomplish this with class MyClass:
| def __init__(self, config: str): self._config
| = config
|
| And if your reaction is "that just means something else needs
| to call Path("config.json").read_text()", you're absolutely
| right! It's separation of concerns; let some other method
| deal with the possibility that `config.json` isn't
| accessible. In the real world, you'd presumably want even
| more specific checks that specific config values were present
| in the JSON file, and your constructor would look more like
| def __init__(self, host: str, port: int):
|
| and you'd validate that those values are present in the same
| place that loads the JSON.
|
| This simple change makes code _so_ much more readable and
| testable.
| asplake wrote:
| Better still, pass the parsed config. Let the app decide
| how it is configured, and let it deal with any problems.
| abdusco wrote:
| My go-to is a factory classmethod: class Foo:
| def __init__(self, config: Config): ...
| @classmethod def from_config_file(cls, filename:
| str): config = # parse config file
| return cls(config)
| pjot wrote:
| Rather than juggling your parent's __init__ on another thread,
| it's usually clearer to:
|
| 1. Keep all of your object-initialization in the main thread
| (i.e. call super().__init__() synchronously).
|
| 2. Defer any ZMQ socket creation that you actually use in the
| background thread into the thread itself.
| ledauphin wrote:
| yeah, this just feels like one of those things that's begging
| for a lazy (on first use) initialization. if you can't share or
| transfer the socket between threads in the first place, then
| your code will definitionally not be planning to use the object
| in the main thread.
| devrandoom wrote:
| That's Python's constructor, innit?
___________________________________________________________________
(page generated 2025-04-19 23:01 UTC)