[HN Gopher] Performance Analysis of Python's Dict() and {}
___________________________________________________________________
Performance Analysis of Python's Dict() and {}
Author : nathiss
Score : 59 points
Date : 2024-01-22 19:34 UTC (3 hours ago)
(HTM) web link (madebyme.today)
(TXT) w3m dump (madebyme.today)
| pphysch wrote:
| The first "benchmark" shows dict() being twice as fast, but the
| rest of the article concludes "The {} is always faster than
| dict."
| michaelhoffman wrote:
| I wonder if they swapped the benchmark results somehow. Because
| on my computer, I get roughly the same times but with {} twice
| as fast as dict().
| nathiss wrote:
| Yes, you're absolutely right. I apologize for this copy-paste
| mistake. It's fixed now.
| Epa095 wrote:
| Damb am I happy you commented this! I saw the same, checked it
| twice, and a final time before posting. But then the results
| were suddenly as expected (dict() being slower). At least I am
| not going mad, he must have changed it just now.
| bigbillheck wrote:
| The difference is 20ns, and if that's enough for you to care
| about you've got much bigger worries (like "using python").
| IshKebab wrote:
| I agree. If you're benchmarking this then you're using the
| wrong language.
|
| But I hadn't considered the fact that `dict` can be overridden.
| That was interesting.
| frakt0x90 wrote:
| I guess I see this post as more about profiling, discovering
| unintuitive differences, and exploring python internals than
| offering 20ns optimizations. I personally appreciated it.
| sjwhevvvvvsj wrote:
| (+20ns x # calls made) can be significant if you're dealing
| with billions of executions though.
|
| If nothing else a search/replace for "dict()" to "{}" could
| yield an aggregate performance bump with basically zero
| refactoring cost.
|
| Also re Python in general: love it or hate it, it "won" as the
| language of ML. If you want to use ML libraries you'll be in
| python.
| ayhanfuat wrote:
| If you are building billions of dicts, the bottleneck will
| not be how you initialize them as empty dicts for sure.
| sjwhevvvvvsj wrote:
| Distributed systems can get very big.
| arsome wrote:
| Yeah I would be much more concerned about things which cause
| exponential time increases, like developers who brute force
| search lists instead of using dictionaries than how they
| declare their dictionaries unless I was at the point of
| profiling and the dict() constructor was somehow near the top
| of the list.
| queuebert wrote:
| I have a book on the shelf called "High Performance Python".
|
| Seriously, though, if every Python program in the world was
| 0.1% faster, that would probably save a lot of energy.
| viraptor wrote:
| You have to compare that to the amount of energy spent while
| making it 0.1% faster. On average, I'm not sure we'd be
| saving any energy. The one bit that would get any remotely
| relevant gain would be offset by lots of tools where that
| improvement happens once a week.
|
| How many minutes / how much energy would it take you to setup
| a project, run a profiler, do the fix, run tests/CI, commit,
| release, etc. -vs- how much energy would that 0.1% change
| save over years?
| nomel wrote:
| Bring globals into local scope, never use '.' in tight loops,
| avoid function calls, etc [1], sure make ugly python code.
| But, wow, it really can make a difference.
|
| [1] https://wiki.python.org/moin/PythonSpeed/PerformanceTips
| ezo wrote:
| I'm wonder, who's thinking that dict() is more readable???
| shoo wrote:
| `{x, y}` is the set containing the two elements `x` and `y`,
| `{x}` is the set containing only the element `x`, so "surely
| `{}` is the empty set" !
| pdonis wrote:
| If Python had had built-in set notation from the start, {}
| might indeed have been a notation for an empty set instead of
| an empty dict.
|
| However, Python didn't even have sets at all until version
| 2.3, and they were in a stdlib module instead of being a
| built-in type until version 2.6. By that time dict notation
| was well entrenched.
| colpabar wrote:
| You _could_ argue that since `{}` is also technically an empty
| set, `dict()` is more explicit.
| masklinn wrote:
| Except `{}` is never an empty set.
| colpabar wrote:
| > x = set()
|
| > x
|
| set()
|
| Well now I know! However I still think the shared usage of
| curly braces for dicts and sets _could_ be somewhat
| confusing.
|
| > x = {1, 2, 3}
|
| > x
|
| {1, 2, 3}
|
| (But maybe no one should listen to me because I don't know
| how to do code blocks on HN.)
| tczMUFlmoNk wrote:
| Indent by four spaces. :-)
| nerdponx wrote:
| dict() can be useful for enforcing (or indicating to the
| reader) that keys are strings and valid Python identifiers. {}
| allows arbitrary keys, as long as the data type is hashable.
| wirrbel wrote:
| less clutter for string-only keys.
| kevincox wrote:
| For string keys it can be as it is `dict(a=1, b=2, c=3)` vs
| `{"a": 1, "b": 2, "c": 3}`. There is a lot of `"` noise on the
| latter.
| formerly_proven wrote:
| 50% less noise with ' though dict is still slightly easier to
| read; you can also convey that the keys are supposed to be
| valid identifiers.
| tweakimp wrote:
| I wish I could do {a=3, b=2}
| locuscoeruleus wrote:
| I think dict(name=name, age=age) can be definitely be more
| readable and ergonomical to write than {"name": name, "age":
| age}.
| mikepurvis wrote:
| I still miss coffeescript's ability to do dict(@name, @age)
| though I understand that it creates an unwelcome coupling
| between the parameter names and their names in the calling
| scope.
|
| For my part, those names are often the same anyway, though,
| since the calling scope names are often arbitrary and might
| as well match the parameter names.
| empiko wrote:
| I think it's more readable. It matches with the type name, it
| is consistent when you define different structures (list, set,
| dict vs [], set, {}), you can pass it as a function, etc. []
| and {} is unnecessary syntax sugar, not to mention how the same
| parentheses are overloaded when sets are defined.
| munch117 wrote:
| I do. dict() costs me less eye strain to recognise out of the
| corner of my eye.
|
| Why the triple question marks? You could have just asked the
| question straight.
| pphysch wrote:
| One place where I prefer dict() is where dicts are used as
| "nested kwargs", e.g. in Django's ORM for passing default
| values. User.objects.get_or_create(
| username="bob", defaults=dict(
| email="bob@example.com", )
|
| this makes it easy to move fields between `defaults` and
| `kwargs`. Granted, you could achieve the same isomorphism with
| ** operator, but less readably IMO.
| trostaft wrote:
| Unless I've misunderstood the comparison, the lists example does
| not appear to be comparing apples to apples. Reproducing here:
| $ python -m timeit "list((1, 2, 'a'))" 5000000 loops, best
| of 5: 53 nsec per loop $ python -m timeit "[(1, 2, 'a')]"
| 10000000 loops, best of 5: 30.4 nsec per loop
|
| The first command produces a list of three elements, whereas the
| second produces a list of one element (being the tuple).
| masklinn wrote:
| You are correct.
|
| To ubench this, you'd want to use `-s` to set up the base
| object which both snippets then convert. >
| python -m timeit -s "t = (1, 2, 'a')" "list(t)"
| 10000000 loops, best of 5: 39.8 nsec per loop > python
| -m timeit -s "t = (1, 2, 'a')" "[*t]" 10000000 loops,
| best of 5: 25.1 nsec per loop > python -m timeit -s
| "[1, 2, 'a']" 50000000 loops, best of 5: 4.28 nsec per
| loop
|
| (this is 3.11.2 on an M1 Pro)
| charlieyu1 wrote:
| The third method should be clearly the fastest one, the other
| two we are building a tuple then convert to a list
| masklinn wrote:
| `-s` stands for "setup", the building of the tuple is only
| done once, and it is not part of the benchmark.
|
| All three versions _only_ bench the construction of the
| list, two of them from a tuple, while the third is a list
| literal. However if you plug it into `dis` you 'll see that
| it compiles to loading a const tuple and creating a list
| from that >>> dis.dis("[1, 2, 'a']")
| 0 0 RESUME 0 1
| 2 BUILD_LIST 0 4
| LOAD_CONST 0 ((1, 2, 'a'))
| 6 LIST_EXTEND 1 8
| RETURN_VALUE >>> dis.dis("[*t]") 0
| 0 RESUME 0 1 2
| BUILD_LIST 0 4
| LOAD_NAME 0 (t) 6
| LIST_EXTEND 1 8
| RETURN_VALUE
| t8sr wrote:
| While it points at interesting questions about Python's
| internals, I hope people writing Python realize that optimizing
| it is pointless, except for cases where you change the complexity
| class of an algorithm.
|
| The performance of pure python code is orders of magnitude worse
| than non-interpreted languages, there's no point trying to shave
| off 0.5% off a 5000% difference.
| pyuser583 wrote:
| It's also pointless because Python internals have no
| guarantees.
|
| Changing version or even platform can change the absolute
| efficiency.
| H8crilA wrote:
| (I generally agree that everyone should almost always choose
| the more readable version)
|
| In this case the speed difference will likely always be
| there, since `dict()` can be overriden (monkey-patched) -
| hence the interpreter needs to resolve it at runtime.
| pyuser583 wrote:
| Isn't some logic necessary to determine if you're dealing
| with a dictionary or set? Or dictionary/set comprehension?
|
| You could probably override it using Python's extension
| system.
|
| Forgive me, I know Python very well, but not C/C++. I've
| learned to make no assumptions.
| Spivak wrote:
| This logic makes no sense, you're saying that trying to boost
| performance by small increments 0-10% isn't worthwhile on a
| Civic because it'll never be a Bugatti? That's the least
| helpful advice when you have a real-life application written in
| Python and want to get some easy wins on your tight loops.
|
| Also Python internals do have some guarantees but in this case
| it's a semantic guarantee. Because builtins can be shadowed
| you'll always have to pay the performance cost of looking them
| up which isn't true for {}.
| geysersam wrote:
| It's unlikely to make an important difference. That's why
| it's a bad idea to spend time on it. It's much more likely
| there are other more impactful changes you can do to improve
| performance, changing the algorithm or using another better
| tool for the job.
| attractivechaos wrote:
| I think the analogy is more like: why open a can with a
| better screwer when you can open it with a can opener. Use
| the right tool for the right task.
| willseth wrote:
| You're off by at least an order of magnitude, and a very common
| class of performance problems that exists in any language is
| copying and allocating memory. Python is no different, and
| there are plenty of facilities to address that and other normal
| performance issues. If you're talking strictly about CPU bound
| performance problems, that's a bit of a red herring considering
| there is an entire ecosystem of Python tools to write
| performant CPU bound code.
| einpoklum wrote:
| If optimization were pointless I'd be out of a job tomorrow.
|
| (My current day job is doing GPU optimization on iterative
| medical image reconstruction computation. But of course, I
| don't do that in Python.)
| shiandow wrote:
| I know hasty optimisation is the root of all evil, but even so
| this has got to be a bit too far in the other direction.
|
| A 2x speed improvement is _very_ significant if it happens to
| be in the critical loop of your code. You want a 5000%
| difference? Just find 6 tweaks like these and you might just
| get there.
|
| Of course in most cases whatever you're doing to build the dict
| is more likely taking up most of the time, not building the
| dict itself, but understanding why one option is 2x as fast is
| still important. Though one can only hope that it will soon be
| irrelevant when JITed python becomes a thing.
| pi-e-sigma wrote:
| > A 2x speed improvement is very significant if it happens to
| be in the critical loop of your code. You want a 5000%
| difference? Just find 6 tweaks like these and you might just
| get there.
|
| Optimizations don't compound like this.
| remus wrote:
| Optimisation isn't necessarily handwritten assembly, sometimes
| you just need to squeeze a little extra juice out of your
| current setup whether that happens to be a python application
| or some c code.
| uses wrote:
| Understanding how python's data structures work is actually
| really important and can make the difference between an
| algorithm completing in microseconds vs "basically never".
| Performance problems in python are going to come down to these
| crucial factors of when to use a set vs list vs dict or a
| dataframe or something else. The fact that it's interpreted is
| just not that significant compared to how data structures are
| set up and operated on. People are using python to explore and
| manipulate very large data - it's the most popular language for
| doing so.
| sweezyjeezy wrote:
| I don't think this will refute the article, but I would have
| found it more convincing if the benchmark had included a single
| setitem assignment as well, to be sure that the difference wasn't
| python doing a lazy dict-or-set type assignment on {}
| ayhanfuat wrote:
| From Alex Martelli , a prominent Python expert [1]:
|
| > I'm one of those who prefers words to punctuation -- it's one
| of the reasons I've picked Python over Perl, for example. "Life
| is better without braces" (an old Python motto which went on a
| T-shirt with a cartoon of a smiling teenager;-), after all
| (originally intended to refer to braces vs indentation for
| grouping, of course, but, hey, braces are braces!-).
|
| > "Paying" some nanoseconds (for the purpose of using a clear,
| readable short word instead of braces, brackets and whatnots) is
| generally affordable (it's mostly the cost of lookups into the
| built-ins' namespace, a price you pay every time you use a built-
| in type or function, and you can mildly optimize it back by
| hoisting some lookups out of loops).
|
| > So, I'm generally the one who likes to write dict() for {},
| list(L) in lieu of L[:] as well as list() for [], tuple() for (),
| and so on -- just a general style preference for pronounceable
| code. When I work on an existing codebase that uses a different
| style, or when my teammates in a new project have strong
| preferences the other way, I can accept that, of course (not
| without attempting a little evangelizing in the case of the
| teammates, though;-).
|
| [1] https://stackoverflow.com/a/2745292/2285236
| mgraczyk wrote:
| I find {} much more readable then dict() because the former is
| more common.
|
| On the other hand list(L) more readable than L[:] because the
| fact that slicing copies is more obscure, and these are not
| equivalent when L is not a list.
|
| My take would be to ignore performance and generally do what is
| more commonly done.
| nephanth wrote:
| My problem with {} is that the syntax for dicts {x1:y1, x2:y2
| ...} and sets {x1, x2 ...} Are very close, and {} is slightly
| ambiguous.
|
| I've run into bugs because i wrote {} for the empty set.
| Since then i write dict() and set()
| grumpyprole wrote:
| > I'm one of those who prefers words to punctuation
|
| It's not punctuation, it's notation. Notation is often
| essential for readability, we use it for mathematics and music
| for a very good reason. Consider why JSON is generally
| preferred to XML. While this guy might have a preference for
| pointless verbosity, most likely the person who needs to read
| his code won't.
| mgaunard wrote:
| does your colleague also uses list instead of []?
|
| If you're in Python, you should embrace the syntax.
| justinl33 wrote:
| Best way to optimize a python dict is to switch languages and use
| a dict there
| __loam wrote:
| Python is filled with these kinds of traps where there's more
| than one way to do the same thing but one way is faster for no
| reason. It's such a bad language.
| encomiast wrote:
| Compared to what?
___________________________________________________________________
(page generated 2024-01-22 23:00 UTC)