hngopher.com

       [HN Gopher] Performance Analysis of Python's Dict() and {}
       ___________________________________________________________________
        
       Performance Analysis of Python's Dict() and {}
        
       Author : nathiss
       Score  : 59 points
       Date   : 2024-01-22 19:34 UTC (3 hours ago)
        
 (HTM) web link (madebyme.today)
 (TXT) w3m dump (madebyme.today)
        
       | pphysch wrote:
       | The first "benchmark" shows dict() being twice as fast, but the
       | rest of the article concludes "The {} is always faster than
       | dict."
        
         | michaelhoffman wrote:
         | I wonder if they swapped the benchmark results somehow. Because
         | on my computer, I get roughly the same times but with {} twice
         | as fast as dict().
        
         | nathiss wrote:
         | Yes, you're absolutely right. I apologize for this copy-paste
         | mistake. It's fixed now.
        
         | Epa095 wrote:
         | Damb am I happy you commented this! I saw the same, checked it
         | twice, and a final time before posting. But then the results
         | were suddenly as expected (dict() being slower). At least I am
         | not going mad, he must have changed it just now.
        
       | bigbillheck wrote:
       | The difference is 20ns, and if that's enough for you to care
       | about you've got much bigger worries (like "using python").
        
         | IshKebab wrote:
         | I agree. If you're benchmarking this then you're using the
         | wrong language.
         | 
         | But I hadn't considered the fact that `dict` can be overridden.
         | That was interesting.
        
         | frakt0x90 wrote:
         | I guess I see this post as more about profiling, discovering
         | unintuitive differences, and exploring python internals than
         | offering 20ns optimizations. I personally appreciated it.
        
         | sjwhevvvvvsj wrote:
         | (+20ns x # calls made) can be significant if you're dealing
         | with billions of executions though.
         | 
         | If nothing else a search/replace for "dict()" to "{}" could
         | yield an aggregate performance bump with basically zero
         | refactoring cost.
         | 
         | Also re Python in general: love it or hate it, it "won" as the
         | language of ML. If you want to use ML libraries you'll be in
         | python.
        
           | ayhanfuat wrote:
           | If you are building billions of dicts, the bottleneck will
           | not be how you initialize them as empty dicts for sure.
        
             | sjwhevvvvvsj wrote:
             | Distributed systems can get very big.
        
         | arsome wrote:
         | Yeah I would be much more concerned about things which cause
         | exponential time increases, like developers who brute force
         | search lists instead of using dictionaries than how they
         | declare their dictionaries unless I was at the point of
         | profiling and the dict() constructor was somehow near the top
         | of the list.
        
         | queuebert wrote:
         | I have a book on the shelf called "High Performance Python".
         | 
         | Seriously, though, if every Python program in the world was
         | 0.1% faster, that would probably save a lot of energy.
        
           | viraptor wrote:
           | You have to compare that to the amount of energy spent while
           | making it 0.1% faster. On average, I'm not sure we'd be
           | saving any energy. The one bit that would get any remotely
           | relevant gain would be offset by lots of tools where that
           | improvement happens once a week.
           | 
           | How many minutes / how much energy would it take you to setup
           | a project, run a profiler, do the fix, run tests/CI, commit,
           | release, etc. -vs- how much energy would that 0.1% change
           | save over years?
        
           | nomel wrote:
           | Bring globals into local scope, never use '.' in tight loops,
           | avoid function calls, etc [1], sure make ugly python code.
           | But, wow, it really can make a difference.
           | 
           | [1] https://wiki.python.org/moin/PythonSpeed/PerformanceTips
        
       | ezo wrote:
       | I'm wonder, who's thinking that dict() is more readable???
        
         | shoo wrote:
         | `{x, y}` is the set containing the two elements `x` and `y`,
         | `{x}` is the set containing only the element `x`, so "surely
         | `{}` is the empty set" !
        
           | pdonis wrote:
           | If Python had had built-in set notation from the start, {}
           | might indeed have been a notation for an empty set instead of
           | an empty dict.
           | 
           | However, Python didn't even have sets at all until version
           | 2.3, and they were in a stdlib module instead of being a
           | built-in type until version 2.6. By that time dict notation
           | was well entrenched.
        
         | colpabar wrote:
         | You _could_ argue that since `{}` is also technically an empty
         | set, `dict()` is more explicit.
        
           | masklinn wrote:
           | Except `{}` is never an empty set.
        
             | colpabar wrote:
             | > x = set()
             | 
             | > x
             | 
             | set()
             | 
             | Well now I know! However I still think the shared usage of
             | curly braces for dicts and sets _could_ be somewhat
             | confusing.
             | 
             | > x = {1, 2, 3}
             | 
             | > x
             | 
             | {1, 2, 3}
             | 
             | (But maybe no one should listen to me because I don't know
             | how to do code blocks on HN.)
        
               | tczMUFlmoNk wrote:
               | Indent by four spaces. :-)
        
         | nerdponx wrote:
         | dict() can be useful for enforcing (or indicating to the
         | reader) that keys are strings and valid Python identifiers. {}
         | allows arbitrary keys, as long as the data type is hashable.
        
         | wirrbel wrote:
         | less clutter for string-only keys.
        
         | kevincox wrote:
         | For string keys it can be as it is `dict(a=1, b=2, c=3)` vs
         | `{"a": 1, "b": 2, "c": 3}`. There is a lot of `"` noise on the
         | latter.
        
           | formerly_proven wrote:
           | 50% less noise with ' though dict is still slightly easier to
           | read; you can also convey that the keys are supposed to be
           | valid identifiers.
        
             | tweakimp wrote:
             | I wish I could do {a=3, b=2}
        
         | locuscoeruleus wrote:
         | I think dict(name=name, age=age) can be definitely be more
         | readable and ergonomical to write than {"name": name, "age":
         | age}.
        
           | mikepurvis wrote:
           | I still miss coffeescript's ability to do dict(@name, @age)
           | though I understand that it creates an unwelcome coupling
           | between the parameter names and their names in the calling
           | scope.
           | 
           | For my part, those names are often the same anyway, though,
           | since the calling scope names are often arbitrary and might
           | as well match the parameter names.
        
         | empiko wrote:
         | I think it's more readable. It matches with the type name, it
         | is consistent when you define different structures (list, set,
         | dict vs [], set, {}), you can pass it as a function, etc. []
         | and {} is unnecessary syntax sugar, not to mention how the same
         | parentheses are overloaded when sets are defined.
        
         | munch117 wrote:
         | I do. dict() costs me less eye strain to recognise out of the
         | corner of my eye.
         | 
         | Why the triple question marks? You could have just asked the
         | question straight.
        
         | pphysch wrote:
         | One place where I prefer dict() is where dicts are used as
         | "nested kwargs", e.g. in Django's ORM for passing default
         | values.                   User.objects.get_or_create(
         | username="bob",           defaults=dict(
         | email="bob@example.com",           )
         | 
         | this makes it easy to move fields between `defaults` and
         | `kwargs`. Granted, you could achieve the same isomorphism with
         | ** operator, but less readably IMO.
        
       | trostaft wrote:
       | Unless I've misunderstood the comparison, the lists example does
       | not appear to be comparing apples to apples. Reproducing here:
       | $ python -m timeit "list((1, 2, 'a'))"       5000000 loops, best
       | of 5: 53 nsec per loop       $ python -m timeit "[(1, 2, 'a')]"
       | 10000000 loops, best of 5: 30.4 nsec per loop
       | 
       | The first command produces a list of three elements, whereas the
       | second produces a list of one element (being the tuple).
        
         | masklinn wrote:
         | You are correct.
         | 
         | To ubench this, you'd want to use `-s` to set up the base
         | object which both snippets then convert.                   >
         | python -m timeit -s "t = (1, 2, 'a')" "list(t)"
         | 10000000 loops, best of 5: 39.8 nsec per loop         > python
         | -m timeit -s "t = (1, 2, 'a')" "[*t]"         10000000 loops,
         | best of 5: 25.1 nsec per loop         > python -m timeit -s
         | "[1, 2, 'a']"         50000000 loops, best of 5: 4.28 nsec per
         | loop
         | 
         | (this is 3.11.2 on an M1 Pro)
        
           | charlieyu1 wrote:
           | The third method should be clearly the fastest one, the other
           | two we are building a tuple then convert to a list
        
             | masklinn wrote:
             | `-s` stands for "setup", the building of the tuple is only
             | done once, and it is not part of the benchmark.
             | 
             | All three versions _only_ bench the construction of the
             | list, two of them from a tuple, while the third is a list
             | literal. However if you plug it into `dis` you 'll see that
             | it compiles to loading a const tuple and creating a list
             | from that                   >>> dis.dis("[1, 2, 'a']")
             | 0           0 RESUME                   0           1
             | 2 BUILD_LIST               0                       4
             | LOAD_CONST               0 ((1, 2, 'a'))
             | 6 LIST_EXTEND              1                       8
             | RETURN_VALUE         >>> dis.dis("[*t]")           0
             | 0 RESUME                   0           1           2
             | BUILD_LIST               0                       4
             | LOAD_NAME                0 (t)                       6
             | LIST_EXTEND              1                       8
             | RETURN_VALUE
        
       | t8sr wrote:
       | While it points at interesting questions about Python's
       | internals, I hope people writing Python realize that optimizing
       | it is pointless, except for cases where you change the complexity
       | class of an algorithm.
       | 
       | The performance of pure python code is orders of magnitude worse
       | than non-interpreted languages, there's no point trying to shave
       | off 0.5% off a 5000% difference.
        
         | pyuser583 wrote:
         | It's also pointless because Python internals have no
         | guarantees.
         | 
         | Changing version or even platform can change the absolute
         | efficiency.
        
           | H8crilA wrote:
           | (I generally agree that everyone should almost always choose
           | the more readable version)
           | 
           | In this case the speed difference will likely always be
           | there, since `dict()` can be overriden (monkey-patched) -
           | hence the interpreter needs to resolve it at runtime.
        
             | pyuser583 wrote:
             | Isn't some logic necessary to determine if you're dealing
             | with a dictionary or set? Or dictionary/set comprehension?
             | 
             | You could probably override it using Python's extension
             | system.
             | 
             | Forgive me, I know Python very well, but not C/C++. I've
             | learned to make no assumptions.
        
         | Spivak wrote:
         | This logic makes no sense, you're saying that trying to boost
         | performance by small increments 0-10% isn't worthwhile on a
         | Civic because it'll never be a Bugatti? That's the least
         | helpful advice when you have a real-life application written in
         | Python and want to get some easy wins on your tight loops.
         | 
         | Also Python internals do have some guarantees but in this case
         | it's a semantic guarantee. Because builtins can be shadowed
         | you'll always have to pay the performance cost of looking them
         | up which isn't true for {}.
        
           | geysersam wrote:
           | It's unlikely to make an important difference. That's why
           | it's a bad idea to spend time on it. It's much more likely
           | there are other more impactful changes you can do to improve
           | performance, changing the algorithm or using another better
           | tool for the job.
        
           | attractivechaos wrote:
           | I think the analogy is more like: why open a can with a
           | better screwer when you can open it with a can opener. Use
           | the right tool for the right task.
        
         | willseth wrote:
         | You're off by at least an order of magnitude, and a very common
         | class of performance problems that exists in any language is
         | copying and allocating memory. Python is no different, and
         | there are plenty of facilities to address that and other normal
         | performance issues. If you're talking strictly about CPU bound
         | performance problems, that's a bit of a red herring considering
         | there is an entire ecosystem of Python tools to write
         | performant CPU bound code.
        
         | einpoklum wrote:
         | If optimization were pointless I'd be out of a job tomorrow.
         | 
         | (My current day job is doing GPU optimization on iterative
         | medical image reconstruction computation. But of course, I
         | don't do that in Python.)
        
         | shiandow wrote:
         | I know hasty optimisation is the root of all evil, but even so
         | this has got to be a bit too far in the other direction.
         | 
         | A 2x speed improvement is _very_ significant if it happens to
         | be in the critical loop of your code. You want a 5000%
         | difference? Just find 6 tweaks like these and you might just
         | get there.
         | 
         | Of course in most cases whatever you're doing to build the dict
         | is more likely taking up most of the time, not building the
         | dict itself, but understanding why one option is 2x as fast is
         | still important. Though one can only hope that it will soon be
         | irrelevant when JITed python becomes a thing.
        
           | pi-e-sigma wrote:
           | > A 2x speed improvement is very significant if it happens to
           | be in the critical loop of your code. You want a 5000%
           | difference? Just find 6 tweaks like these and you might just
           | get there.
           | 
           | Optimizations don't compound like this.
        
         | remus wrote:
         | Optimisation isn't necessarily handwritten assembly, sometimes
         | you just need to squeeze a little extra juice out of your
         | current setup whether that happens to be a python application
         | or some c code.
        
         | uses wrote:
         | Understanding how python's data structures work is actually
         | really important and can make the difference between an
         | algorithm completing in microseconds vs "basically never".
         | Performance problems in python are going to come down to these
         | crucial factors of when to use a set vs list vs dict or a
         | dataframe or something else. The fact that it's interpreted is
         | just not that significant compared to how data structures are
         | set up and operated on. People are using python to explore and
         | manipulate very large data - it's the most popular language for
         | doing so.
        
       | sweezyjeezy wrote:
       | I don't think this will refute the article, but I would have
       | found it more convincing if the benchmark had included a single
       | setitem assignment as well, to be sure that the difference wasn't
       | python doing a lazy dict-or-set type assignment on {}
        
       | ayhanfuat wrote:
       | From Alex Martelli , a prominent Python expert [1]:
       | 
       | > I'm one of those who prefers words to punctuation -- it's one
       | of the reasons I've picked Python over Perl, for example. "Life
       | is better without braces" (an old Python motto which went on a
       | T-shirt with a cartoon of a smiling teenager;-), after all
       | (originally intended to refer to braces vs indentation for
       | grouping, of course, but, hey, braces are braces!-).
       | 
       | > "Paying" some nanoseconds (for the purpose of using a clear,
       | readable short word instead of braces, brackets and whatnots) is
       | generally affordable (it's mostly the cost of lookups into the
       | built-ins' namespace, a price you pay every time you use a built-
       | in type or function, and you can mildly optimize it back by
       | hoisting some lookups out of loops).
       | 
       | > So, I'm generally the one who likes to write dict() for {},
       | list(L) in lieu of L[:] as well as list() for [], tuple() for (),
       | and so on -- just a general style preference for pronounceable
       | code. When I work on an existing codebase that uses a different
       | style, or when my teammates in a new project have strong
       | preferences the other way, I can accept that, of course (not
       | without attempting a little evangelizing in the case of the
       | teammates, though;-).
       | 
       | [1] https://stackoverflow.com/a/2745292/2285236
        
         | mgraczyk wrote:
         | I find {} much more readable then dict() because the former is
         | more common.
         | 
         | On the other hand list(L) more readable than L[:] because the
         | fact that slicing copies is more obscure, and these are not
         | equivalent when L is not a list.
         | 
         | My take would be to ignore performance and generally do what is
         | more commonly done.
        
           | nephanth wrote:
           | My problem with {} is that the syntax for dicts {x1:y1, x2:y2
           | ...} and sets {x1, x2 ...} Are very close, and {} is slightly
           | ambiguous.
           | 
           | I've run into bugs because i wrote {} for the empty set.
           | Since then i write dict() and set()
        
         | grumpyprole wrote:
         | > I'm one of those who prefers words to punctuation
         | 
         | It's not punctuation, it's notation. Notation is often
         | essential for readability, we use it for mathematics and music
         | for a very good reason. Consider why JSON is generally
         | preferred to XML. While this guy might have a preference for
         | pointless verbosity, most likely the person who needs to read
         | his code won't.
        
       | mgaunard wrote:
       | does your colleague also uses list instead of []?
       | 
       | If you're in Python, you should embrace the syntax.
        
       | justinl33 wrote:
       | Best way to optimize a python dict is to switch languages and use
       | a dict there
        
       | __loam wrote:
       | Python is filled with these kinds of traps where there's more
       | than one way to do the same thing but one way is faster for no
       | reason. It's such a bad language.
        
         | encomiast wrote:
         | Compared to what?
        
       ___________________________________________________________________
       (page generated 2024-01-22 23:00 UTC)