[HN Gopher] Pandas Illustrated: Visual Guide to Pandas
___________________________________________________________________
Pandas Illustrated: Visual Guide to Pandas
Author : nemoniac
Score : 109 points
Date : 2023-01-27 19:41 UTC (3 hours ago)
(HTM) web link (scribe.citizen4.eu)
(TXT) w3m dump (scribe.citizen4.eu)
| axi1 wrote:
| The proper (free) link is https://betterprogramming.pub/pandas-
| illustrated-the-definit...
| jcq3 wrote:
| Yet another pandas tutorial. Got chatgpt now, thx.
| r2_pilot wrote:
| Good luck with plausible hallucinated interfaces in your
| statistically-generated responses.
| timdellinger wrote:
| This seems to be getting the Hug of Death, but this looks like
| the content:
|
| https://betterprogramming.pub/pandas-illustrated-the-definit...
| neonate wrote:
| https://web.archive.org/web/20230127194856/https://scribe.ci...
| dark-star wrote:
| These are not the Pandas I was looking for _waves hand_
| matsemann wrote:
| Can recommend taking a look at Polars. Kinda a successor to
| pandas.
|
| https://www.pola.rs/
| z3c0 wrote:
| Interesting. Seems to also take quite a few leaves from
| PySpark's book.
| 89vision wrote:
| Neat. I love that there's a rust implementation. Types make
| everything better
| throwaway_75369 wrote:
| So, given the title and how stressful the last couple of weeks
| have been, I was sadly disappointed when this wasn't about
| drawing cute black and white bears.
|
| I mean, data analysis is useful and all, but not what the heart
| wanted at the moment.
| [deleted]
| 867-5309 wrote:
| asking DALL-E for some Python Pandas might relieve our
| disappointment
| [deleted]
| [deleted]
| irrational wrote:
| LOL. Those were not the kind of pandas I was expecting.
|
| One of my daughters is a panda bear fanatic and I thought this
| would be a resource I could share with her.
| tomcam wrote:
| Same! Although at first glance it appears to be an excellent
| example of clear, well-illustrated documentation.
| oneoff786 wrote:
| I do almost all of my day job in pandas. I consider myself very
| good at it. My number one recommendation to new data scientists
| learning the ropes is to just not use NumPy almost at all. I'm
| not sure where people learn it but they do all of this
| complicated nonsense. Just map simple Python lambda funcs with
| pd.Series.map and that's most of what you need. Memorize your
| pd.DataFrame methods.
|
| If your code feels like it dealing with a matrix and not a table,
| it's probably doing something funny.
| boppo1 wrote:
| What is your day job?
| ajoseps wrote:
| I think it really depends on the scale of data. If you're
| dealing with anything less than a GB, it probably doesn't
| matter all that much, but once you're dealing with larger
| datasets there is a pretty massive difference with using
| vectorized operation. Some of the pandas dataframes methods map
| to underlying numpy ones, but I don't believe that is always
| the case
| _Wintermute wrote:
| You lose a lot of performance not using vectorised functions.
| Maybe not an issue if you're only dealing with small amounts of
| data.
| oneoff786 wrote:
| Series.map is vectorized.
|
| Pretty much everything you need in pandas is as performant as
| you ought to need for doing tabular data manipulation in
| Python. Except dataframe.apply
| _Wintermute wrote:
| It is not. df = pd.DataFrame({"foo":
| np.random.randn(100000)})
|
| pandas map: df["foo"].map(lambda x: x *
| 2)
|
| 18.1 ms +- 109 us per loop (mean +- std. dev. of 7 runs,
| 100 loops each)
|
| pandas apply: df["foo"].apply(lambda x: x
| * 2)
|
| 17.9 ms +- 46.6 us per loop (mean +- std. dev. of 7 runs,
| 100 loops each)
|
| Vectorised function, using underlying numpy operations:
| df["foo"] * 2
|
| 267 us +- 11.8 us per loop (mean +- std. dev. of 7 runs,
| 1000 loops each)
| lcvriend wrote:
| If by "vectorized" you mean: "able to delegate the task of
| performing mathematical operations on the array's contents
| to optimized, compiled C code." then I do not think you are
| correct (unless perhaps you are supplying map with a dict
| or Series).
|
| Series.map is not compiling your lambda's to C and running
| it. If there is a built-in method available it usually will
| be faster. Notable exception are pandas str methods which
| devolve into Python code but generally with more overhead
| than map/apply.
| voxelghost wrote:
| Check out polars.
|
| Vectorized, choice between lazy optimization and eager.
| rbanffy wrote:
| @dang can you replace the link with the original?
| https://betterprogramming.pub/pandas-illustrated-the-definit...
| [deleted]
___________________________________________________________________
(page generated 2023-01-27 23:00 UTC)