[HN Gopher] Python for Excel: A Modern Environment for Automatio...
___________________________________________________________________
Python for Excel: A Modern Environment for Automation and Data
Analysis
Author : teleforce
Score : 184 points
Date : 2021-07-13 05:31 UTC (1 days ago)
(HTM) web link (www.xlwings.org)
(TXT) w3m dump (www.xlwings.org)
| harperweaver wrote:
| This would probably make a great Christmas gift for some analyst
| you know. Even if they don't use Excel now, it'll make them feel
| like their professional career is coming around full circle.
| acmecorps wrote:
| What apt timing. I'm trying to do the same thing with Ruby;
| anyone knows a good Ruby gem dealing with Excel?
| gizdan wrote:
| Slightly offtopic, I'm why has python not been integrated into
| LibreOffice yet?
| Tajnymag wrote:
| Office compatibility perhaps?
| teddyh wrote:
| https://help.libreoffice.org/6.3/en-US/text/sbasic/python/py...
| gizdan wrote:
| This isn't embedded though,this is running a python script
| side by side and using python to query libreoffice as opposed
| to enhancing a workbook via python.
| peterhil wrote:
| I wrote a handy wrapper for xlrd in 2014 while making my project
| Ninhursag. I just checked and it still works with Python 2 - here
| is an example usage: pip2 install excelsior
| curl -O https://prd-wret.s3.us-west-2.amazonaws.com/assets/pallad
| ium/production/atoms/files/ds140-2017-bauxi.xlsx excelsior
| -f tsv ds140-2017-bauxi.xlsx
|
| Source code: https://github.com/peterhil/excelsior
|
| The Ninhursag project is visualisation and modeling of USGS data
| series 140, which include world production of almost all minerals
| starting from the year 1900:
|
| https://ninhursag.herokuapp.com/mineral/statistics (Heroku free
| tier - might take a few seconds to load)
| markus_zhang wrote:
| Being working on big data for a few years and I'm glad that Excel
| is mostly a place to paste some data for debugging. But I'm sure
| it's still useful for people who can leverage power pivot and all
| those tools for medium size data. I used xlwings occasionally
| some years ago and it was not a pleasant experience back then. I
| heard it improved a lot these years though.
| arthurcolle wrote:
| It's fine - pretty easy to import data into a DataFrame and do
| whatever you want. The API seems relatively simple - you can
| marshal data into a DataFrame and vice versa.
|
| Take it for a spin if you have any interesting spreadsheets in
| need of some love!
| lhuser123 wrote:
| I think Excel online with Office Script
| https://docs.microsoft.com/en-us/office/dev/scripts/overview...
| is awesome. Combine it with OneDrive & Power Automate & you can
| share & automate many boring stuff.
| harryf wrote:
| Does Excel online still have row and column limits?
|
| It's always been a mystery to me why Microsoft never addressed
| this - they could have prevented a whole host of competitors in
| the "big data" space from ever existing if they'd found a way
| of abstracting this limit away from end users.
| pjmlp wrote:
| Most likely, because for doing those use cases you should buy
| into Power BI.
| syshum wrote:
| "big Data" and excel should never be used in the same
| sentence, This like using a screwdriver to drive a nail...
| Get a Hammer
|
| Too many people believe excel should do everything... no,
| just no
| WalterGR wrote:
| Is it a limit in Excel or OOXML?
| alex-nikitin wrote:
| I used xlwings in a variety of settings over the years, and it
| was my go-to when building the first two prototypes of our Excel
| commodity price sync functionality. We hit a performance wall
| pretty quickly due to the nature of our use case so we moved to
| Excel's C API and built our own add-in. This move took a lot of
| effort to get right, the C API has a tendency to punish you hard
| (crash Excel) for the smallest of transgressions. xlwings allowed
| us to prove the concept pretty quickly and without fear of
| crashing Excel before committing to C.
|
| Don't mean for this to sound like a testimonial for xlwings (I
| guess, it kinda is) - it's been very useful for us so I had to
| share! It's a great library, has great documentation, and I can't
| recommend it enough. However, if you're after low latency* for
| real-time interactions across a large dataset, you'll need to
| look elsewhere, or roll your own.
|
| _* under 2 second response; you 're dealing with Excel after
| all!_
|
| _Shameless plug for our product:https://www.thectgrid.com/_
| Tarsul wrote:
| I guess with your tool collaboration with a few people across
| one excel sheet would work more smoothly than with an Excel
| file on OneDrive?
|
| We work in a team of only 3 people with an excel file on
| OneDrive and what annoys me the most is that it happens quite
| often that excel "says" that your changes have been saved (e.g.
| it says it's saved in the header, also the blue onedrive symbol
| shows no circle. The best indicator is the file in the
| explorer: If that shows the green checkmark then it's good)
| while in reality it's not and you only get a notification (that
| it cannot save because it cannot updates from someone else with
| yours) when you try to close the file. Annoying. Anyone with
| solutions is very welcome :)
| alex-nikitin wrote:
| Absolutely, the people using us seem think so! We target a
| niche: real time sharing of commodity pricing. The linked
| data can live in different files and layouts, can be updated
| via web or mobile, and full change history is stored as well.
|
| We explored a few options when trying to solve the sharing
| problem, including OneDrive, Google Sheets, and some 3rd
| parties. Couldn't find anything that fit the bill.
|
| FWIW- What you describe could be down to OneDrive's data
| quotas. If you're dealing with pricing data or any numerical
| data then CT Grid solves that issue. Even if it's not a
| perfect fit, I'd be happy to share the experience if it helps
| you.
| dang wrote:
| Past related threads:
|
| _The new dynamic arrays in Excel with Python and xlwings_ -
| https://news.ycombinator.com/item?id=20381083 - July 2019 (9
| comments)
|
| _Call Python functions from VBA in all Microsoft Office apps_ -
| https://news.ycombinator.com/item?id=18166421 - Oct 2018 (25
| comments)
|
| _Xlwings: Replace Excel VBA with Python_ -
| https://news.ycombinator.com/item?id=10304402 - Sept 2015 (1
| comment)
|
| _Use Python in Excel without add-ins_ -
| https://news.ycombinator.com/item?id=8372329 - Sept 2014 (70
| comments)
| fzumstein wrote:
| Book author and xlwings creator here. Thanks for posting this
| link! Funny enough, about 10 days after sending the book to the
| printer (March 11), Microsoft updated the feature request
| ("Python as en Excel scripting language") on their UserVoice page
| with a comment that looks like official support could eventually
| happen: "Thank you for the continued interest in this space.
| While we don't have specific plans to announce at this time, we
| have been researching the topic, conducting customer interviews,
| and are working with the Python team at Microsoft so we can build
| a plan that we think can address the scenarios you told us about
| and ensure it can run wherever Excel runs." Fingers crossed! See
| https://excel.uservoice.com/forums/304921-excel-for-windows-...
| ineedasername wrote:
| What's even more surprising to me is that (unless I missed
| something) MS hasn't pushed anything significant in terms of R
| integration to Excel. I just assumed that was one of their end-
| goals after they bought up Revolution Analytics.
|
| I know they've done some R integration with PowerBI, but it's
| mind boggling how they ignore the two most popular programming
| languages that heavy Excel users would also be interested in.
| RustyConsul wrote:
| Thank you soooo much! Your library saved me 100's of hours when
| i was working at Tesla. Showed the whole CapEx Team and they
| loved it. Will definitely be buying this book.
| fujidust wrote:
| Thanks for writing this book. I'm on the fence about going down
| this road or learning Power BI instead. Can you recommend a
| beginner resources on using Python in Excel? I'd rather use
| Excel and learn Python, for sure!
| markus_zhang wrote:
| I have used Power BI and let's say Python (or any other
| programming language that has a graphic lib) gives you way
| more freedom.
|
| Power BI and Tableau are favoured by analysts who do not
| necessarily want to put down a lot of efforts to learn
| programming (which is fine).
| fujidust wrote:
| That was a very helpful distinction, thanks!
| [deleted]
| markus_zhang wrote:
| Hi man I hope they can buy you out.
| Havoc wrote:
| Haven't they been saying that for years?
|
| Seem to recall a ~2017 article about it...
| pjmlp wrote:
| Yes, that is standard Microsoft speak for "thanks, we will
| think about it, just not now".
| markus_zhang wrote:
| I guess they really get a lot of pressure there.
|
| Putting Python in Excel can potentially lift more analysts
| to proper developers or at least make the code base more
| robust. A lot of analysts' first language is VBA and then
| they move on to more serious development and land on a
| developer job.
|
| I'm one of those examples. My first language was VBA and I
| dig a bit deeper into it (e.g. reading about moving
| pointers and general coding best practice, downloading
| rubberduck extension for VBA, etc.) and taught myself C++
| and Python. Eventually I landed a BI developer job.
| pjmlp wrote:
| I have seen many move to VB by following that path on
| life science research labs that tend to be Windows based,
| as per reading devices and laboratory robots.
|
| So they went Excel => .NET, and were served by .NET and
| COM APIs in Excel.
|
| In modern Excel you also have Lambda, JavaScript and
| PowerQuery as an inbox option.
| markus_zhang wrote:
| I thought about moving to C# but we didn't really need
| the power so I didn't bother.
|
| PowerQuery is also very good for data cleaning. Learned a
| few tricks here and there.
| ineedasername wrote:
| What do some of these tools do to the portability of an Excel
| doc? Will some things not work if I send the file to someone
| else who doesn't have python or xlwings?
| machinehermiter wrote:
| I still like Excel for certain things but I don't know why you
| wouldn't just use Pandas instead of this and cut out Excel.
| andrewlgood wrote:
| Excel has become the de facto report writer. Look at how many
| products have Excel plug ins. This allows them to skip the UI
| development and the report writing portion.
| rlayton2 wrote:
| Pandas doesn't do formatting very well. Using pandas to
| generate non tabular data is also a non starter. So using
| pandas to fill out an existing Excel file, say by entering data
| in specific cells, won't work.
|
| Libraries like xlread and xlwrite can and would be one way to
| automate such processes. Xlwings can too. I _think_ that
| Xlwings can do macros where the others can 't though
| romeoblade wrote:
| Excel is just xml file. What I do is design the excel file
| how I need it. Export/Save as XML then use jinja2 and
| template it.
|
| First sheet is usually raw data in standard rows/columns,
| second sheet would have the cells I need, etc a cell for sum
| of x columns from sheet 1.
|
| Then use panda's to get the data, and feed it through
| jinja2's template render function.
| MeinBlutIstBlau wrote:
| I thought this was going to be about python INSIDE Excel. If MS
| ditched VBA for python, that would be incredible. VBA needs to
| honestly die cause it is just a horrible syntax to read. It's
| like staring at a SQL Procedure.
| pjmlp wrote:
| You can use lambda, JavaScript, any .NET language, any language
| that knows COM.
|
| Those that only use VBA only have themselves to blame.
| [deleted]
| andrewlgood wrote:
| Back to permissions. Most office workers are not authorized
| to use those tools. They need the capability built into
| Excel.
| pjmlp wrote:
| While I agree permissions are an issue:
|
| - lambda and JavaScript, out of the box in latest versions
|
| - any .NET or COM aware language, PowerShell fits the bill,
| there is even a mini-IDE installed by default
| MeinBlutIstBlau wrote:
| >out of the box in latest versions
|
| Not every company uses the latest versions of everything.
| We finally switched everyone to Windows 10 this year. We
| were paying for Windows 7 support while making the
| transition.
| andrewlgood wrote:
| Thank for the clarification. I did see the lambda
| announcement, but have not looked at it yet. Will move
| that up the priority list.
| pjmlp wrote:
| While you are at it, have a look at Power Query as well,
| https://docs.microsoft.com/en-us/powerquery-m/
| occamrazor wrote:
| I don't think one can make UDF with COM.
| dragonwriter wrote:
| > I thought this was going to be about python INSIDE Excel.
|
| Yes, xlwings (one of the packagess covered by the book, and the
| one on whose site the page is hosted) absolutely allows this
| via an Excel add-in:
| https://docs.xlwings.org/en/stable/addin.html
| MeinBlutIstBlau wrote:
| Unfortunately for all the corporate jobs I've worked at, even
| getting an add-on for notepad++ has more red tape than a FOIA
| request...
| dmz73 wrote:
| Python is a nicer language to use than C or C++ but that is not
| a very high bar to pass.
|
| The main reason VBA is horrible is because it allows dynamic
| typing by default and Python would not help there at all.
|
| VB syntax is actually nicer than Python syntax in at least 3
| aspects: 1) VB is not case sensitive 2) VB allows specifying
| data types 3) VB uses If Then Else End If style syntax that
| makes it clear where things start and stop.
|
| If Excel actually had the SQL based query interface inside the
| spreadsheet, it would make most of VBA unnecessary and probably
| improve the performance of Excel macros by several orders of
| magnitude.
| omnicognate wrote:
| On Error Resume Next
| ptx wrote:
| Case insensitivity might be fine on a language level, but the
| mandatory IDE makes it annoying. It tries to normalize the
| case of identifiers, which would be a nice feature except for
| the fact that it doesn't know about scopes and can't tell
| variables from properties.
|
| The result is that if you declare a variable named "count"
| anywhere in your project, every property (and every unrelated
| variable) everywhere gets renamed from "Count" to "count".
| Makes for annoying diffs.
| IshKebab wrote:
| I don't know - C++ has a sane static type system, value
| semantics and isn't ridiculously slow.
|
| Also case insensitivity is a terrible idea. Everywhere it is
| used it causes problems.
| dragonwriter wrote:
| > VB syntax is actually nicer than Python syntax in at least
| 3 aspects: 1) VB is not case sensitive
|
| VB is not case sensitive, but case insensitivity isn't nice;
| it reduces clarity and forces circumlocution in cases where
| having case distinctions would not.
|
| > 2) VB allows specifying data types
|
| Python allows specifying data types, and the major python
| typecheckers support a more robust type system than VBA.
|
| > 3) VB uses If Then Else End If style syntax that makes it
| clear where things start and stop
|
| Python uses indentation-based syntax that makes it more
| immediately visually clear where things start and stop with
| less visible noise.
| Havoc wrote:
| Missing a key component: Ability to install it.
|
| Much of the Excel target audience - office workers - don't have
| much control over their work computer let alone ability to
| install stuff like pip allowing downloads from an unvetted repo
| fzumstein wrote:
| You're right, it's the biggest challenge. But a ton of
| companies do now have something like Anaconda in their official
| software catalogue and sometimes even an internal mirror of
| PyPI or anaconda.org for the packages.
| mch82 wrote:
| > Does the book use any commercial software like xlwings PRO?
| No, the book uses exclusively software that is open source
| and free (except for Excel, of course).
|
| This is no longer technically true of Anaconda. They've
| adopted a new licensing approach that prevents employees of
| companies from using the free edition. Please consider adding
| alternative install instructions, based on PyPI and pipenv or
| poetry, to future versions of your book. This would help a
| ton of data scientists out!
|
| I'm looking forward to reading your book via O'Reilly!
|
| References:
|
| Discussion on r/Python with comments from the Anaconda CEO
| and team, https://www.reddit.com/r/Python/comments/iqsk3y/ana
| conda_is_...
|
| Anaconda license change press release,
| https://www.anaconda.com/blog/anaconda-commercial-edition-
| fa...
|
| Anaconda Terms of Service, https://www.anaconda.com/terms-of-
| service
|
| Edit: Here's an interesting comment on the Reddit discussion
| from the Anaconda CEO suggesting that Miniconda can still be
| used with conda forge: "You can download Miniconda, and
| change your conda config to use conda forge, and the Terms of
| Service do not apply to that. The ToS only applies to
| commercial usage of the package repository of packages we
| build, at repo.anaconda.com; it does not apply to community-
| built and uploaded packages at anaconda.org.", https://www.re
| ddit.com/r/Python/comments/iqsk3y/anaconda_is_...
| mch82 wrote:
| Side note...
|
| While Anaconda Inc.'s licensing approach is currently
| confusing/frustrating, they're correct that open source
| sustainability is a challenge. Projects need financial
| support as well as volunteers.
|
| SciPy.org maintains Pandas, NumPy, Jupyter, iPython,
| Matplotlib, etc. They've buried a donation page on their
| website at... https://www.scipy.org/scipylib/donations.html
| fzumstein wrote:
| Thanks for pointing this out and I'll update the homepage
| accordingly. You're right about the sustainability
| challenge though!
| pieter_mj wrote:
| you can install python and https://pypi.org/project/pywin32/ as
| a limited user. pywin32 allows you to automate ms office (among
| which excel) from within python. if you can't disable the
| corporate proxy have a look at cntlm.
| Jolter wrote:
| Considering the inclusion of Pandas in the book, I doubt
| manager types and econ guys are the target group. IMO,
| engineers would be the ones interested in applying Python to
| Excel work (because they realize how ridiculous VBScript is),
| and they are pretty likely to have install permissions on their
| machine.
| kingkongjaffa wrote:
| In non tech companies it's super rare to have install
| permissions in my experience
| (Manufacturing/Automotive/Aerospace etc.) most companies are
| running proprietary software from a few key engineering
| software players, and very rarely there's some team writing
| legacy but hyper specialised FORTRAN / C / C++ engineering
| analysis tools.
|
| It's a big issue to build tools for others to use - an excel
| file on a shared drive somewhere might be developed by one
| person with python installed and elevated permissions but
| used by 50 with no python installation etc.
|
| IT arn't going to adjust their policies for a small group
| unless they have significant organizational sway.
| carlosf wrote:
| Installing python does not require admin permissions in
| Windows, just choose $HOME/... as install path.
|
| It also does not require admin permissions in Linux,
| although most distros do not offer a happy path for that.
| kingkongjaffa wrote:
| corporate IT in the companies I am talking about is
| significantly more locked down, python is not install-
| able by going to https://www.python.org/ and downloading
| the binary there. Downloading executables is locked down.
|
| Not really sure how this is a rebuttal, maybe what you
| say is true on a default windows installation, not on an
| IT-dept managed system.
| kitrose wrote:
| You can install Anaconda without admin.
| Jolter wrote:
| I'm sure there are corporate users who would like to use
| this tool but won't be able to install i. However, I do not
| think there is any call for posting dismissive comments
| solely based on that. It does not detract from the value of
| the tool, for those who can get it.
| kingkongjaffa wrote:
| How is my comment dismissive?
|
| I'm sharing my experience of this exact problem we have
| right now. Users cannot install python on their machines,
| in an org of 60k engineers, asking for changes to a small
| group of users comes with heavy resistance from IT.
|
| On these machines (most computers in most offices in
| every western organization) no software can be installed
| from online, and running any form of executable requires
| IT/Admin elevation.
|
| It's like you guys have never worked in an office in the
| real world that isn't dev/tech.
| patmorgan23 wrote:
| I think your vastly overstating how locked down most
| companies workstations are. Sure if you work for a
| fortune 500 they might be but there are 10,000 other
| companies where that probably isn't the case.
| CleaveIt2Beaver wrote:
| Really? I work for a <50 employee
| manufacturer/distributor, and we're pretty firmly locked
| down. If people want to _update Java_, they need an admin
| password.
| deregulateMed wrote:
| I wrote Random Forest in VBA. Programming is what you make of
| it.
|
| That said, VBA is slow.
| ptx wrote:
| Is it slower than Python, though? It seems intuitively like
| it should be faster for a lot of things, since it has non-
| boxed numeric types, early binding, etc.
| deregulateMed wrote:
| I'm not sure, selecting data from cells was slow. There
| might be ways to get around it. I did all sorts of
| optimization, but looking back on it, I probably
| shouldn't have saved anything in cells.
| andrewlgood wrote:
| I wouldn't sell the financial analysts short. Pandas was
| developed by financial analysts to assist in their routine
| analytics work.
| sterlinm wrote:
| Yeah, there is a lot of ridiculous usage of Excel in
| corporate America, but the idea that econ/finance/MBA types
| are not aware of Python/pandas these days is a farfetched.
|
| There are tons of resources like this.
| https://quantecon.org/
| wil421 wrote:
| Somehow the really large MegaCorp I work at has Python,
| Anaconda3, and Pycharm. I was surprised and have been using it
| to automate some mundane work I have to do.
| andrewlgood wrote:
| Agree. This is why there is so much MS Access stuff out there.
| It came as part of the MS Office suite so office workers had it
| on their desktop and could use it without having to IT and work
| their way up the priority queue.
|
| VBA is terrible, but it is the only programming-like option for
| most office workers outside of IT/Engineering
| TrackerFF wrote:
| Quite relatable.
|
| At work, we want to build a simple database and app to
| interface against it. Asked our boss, asked the IT guys, got
| the same answer:
|
| A : "Sure, but it has to be MS Access"
|
| Q : "Why though, PostgreSQL / SQLite/ etc. works fine"
|
| A : "Because we already have MS Access"
|
| Q : "OK - could we at least get [programming language +
| frameworks] for the interface part"
|
| A : "Use Excel and VBA. We already have that"
| Finnucane wrote:
| I've had that same conversation, but subsititute FileMaker
| for Access (no access to Access on a Mac). I can run
| anything I want on my own Mac, but there's no way to make
| it shareable with the rest of the office.
| syshum wrote:
| wait you asked IT to use something other than Access and
| they turned you down :rolleyes:
|
| I am on a mission to eliminate Access... the day i can
| distribute office with out Access will be a glorious day
| indeed
|
| No IT Dept should be telling users to use Access.....
| petepete wrote:
| I really wish Resolver One had caught on, it was such an amazing
| product and really did find a nice middle ground between
| spreadsheet and code.
|
| Here's a one-minute intro from their CTO showing off some of the
| basic features. The video is from 2009 and I'm amazed there's
| been nothing comparable released, open source or otherwise, in
| the intervening years. If there has and I've missed it, I'd love
| to know!
|
| https://www.youtube.com/watch?v=u6EV2jiKRfc
| fzumstein wrote:
| There is https://gridstudio.io/ but I am not sure what the
| current project status is...
| petepete wrote:
| Ah thanks. It definitely looks like it has some similarities,
| hoping development picks up on it again.
| andrewlgood wrote:
| Terribly misleading title. Packages to allow Python to
| create/manipulate Excel files have been around for a long time.
|
| Like MeinBlutIsBlau, I thought this was going to be Microsoft
| finally ditching VBA for Python as its macro language.
| dragonwriter wrote:
| > Terribly misleading title.
|
| No, its not.
|
| > Packages to allow Python to create/manipulate Excel files
| have been around for a long time.
|
| Xlwings (1) has been around for a while, itself, and more to
| the point (2) isn't just a "package to allow Python to
| create/manipulate Excel files". It includes an Excel add-in to
| allow embedding Python in Excel, using it for purposes similar
| to VBA.
|
| > Like MeinBlutIsBlau, I thought this was going to be Microsoft
| finally ditching VBA for Python as its macro language.
|
| Well, it allows users to do that without Microsoft doing it, so
| that's closer to accurate than the description implied by your
| dismissal.
| pjmlp wrote:
| If you don't want to use VBA, you don't have to.
|
| There is lambda and JavaScript support.
|
| https://www.microsoft.com/en-us/research/blog/lambda-the-ult...
|
| https://exceljet.net/excel-functions/excel-lambda-function
|
| https://docs.microsoft.com/en-us/office/dev/add-ins/referenc...
| WalterGR wrote:
| > Microsoft finally ditching VBA for Python as its macro
| language.
|
| Can't you use any language that 'integrates with' Microsoft
| Scripting Host?
|
| https://en.wikipedia.org/wiki/Windows_Script_Host
| ptx wrote:
| You can use any language that supports COM by creating an
| Excel add-in. But then you have to figure out how to
| distribute it to the users.
|
| WSH is something quite different. It runs VBScript, which is
| a different language from VBA. VBA is the same language as
| Visual Basic 6 and uses the same IDE.
| _1tan wrote:
| Is there something similar for Google Sheets? Perhaps as a
| browser extension?
___________________________________________________________________
(page generated 2021-07-14 23:03 UTC)