[HN Gopher] My Beancount books are 95% automatic after 3 years (...
___________________________________________________________________
My Beancount books are 95% automatic after 3 years (2024)
Author : leonry
Score : 154 points
Date : 2025-03-05 16:08 UTC (6 hours ago)
(HTM) web link (fangpenlin.com)
(TXT) w3m dump (fangpenlin.com)
| achierius wrote:
| Anyone tried this tool out for their personal accounting? My
| books are currently in a big custom Google sheet system, and
| building metrics on top of that has been a pain.
| miroljub wrote:
| It follows the same principles as ledger-cli and hledger.
|
| You won't make a mistake using any of them.
|
| I used and still use both, since they share the file format.
| Beancount is more opinionated, with slightly different file
| format, so I didn't bother adding it.
| Sphax wrote:
| Yes. I have maintained my ledger with beancount for 5/6 years
| now. I don't automate downloading from my bank, it's not worth
| the hassle especially now that every login requires 2FA with my
| phone.
|
| But I did write an importer for the csv files my bank provides
| and with smart_importer I don't even have to categorize the
| statements anymore (although there are mistakes sometimes). I
| don't gather metrics though , I use fava to have a visual view
| of my books.
|
| I usually spend half an hour per month maintaining the ledger .
| clucas wrote:
| Yes, I have one big beancount file tracking everything back to
| 2014. I use fava for visualization and git for history/backup.
| I've occasionally gone back and split or moved accounts, based
| on what metrics I want to track at that moment, it's pretty
| easy.
|
| However, contrary to the article's automated method, my
| workflow is to manually input transactions every day (or every
| couple days, depending on what's going on) and balance my
| accounts. It's a bit of a ritual, but I like having a really
| good handle on day-to-day spending. Plus, I find ~10 minutes
| once a day way easier than (e.g.) ~3 hours once a month, even
| if it's the same amount of time overall.
| anon291 wrote:
| I have used beancount and it is significantly better than an
| Excel or Google Sheet.
|
| For me, the main benefit is tax lot matching and an auditable
| trail of sales should the IRS come knocking. It's impossible to
| do this properly in a spreadsheet (I mean... it is possible,
| but no one will complain if it's done wrong until it's too
| late). With beancount, matching is much more straightforward. I
| have a python script that does it automatically for each sale
| using normal FIFO, tax loss minimization, etc. Due to
| beancount's internal checks, I am certain that these are
| correct once they're entered into the ledger. There is no
| chance of failure, and if the IRS ever asks me to justify a
| capital gain / loss figure, it's very easy to point out that I
| keep track of all my shares and have for the past several years
| and I maintain a solid record of the entire history of each
| lot.
| FredPret wrote:
| Another very beneficial double-check is asserting account
| balances at certain dates.
|
| You give beancount the balances at the end of each month
| straight from your bank / broker statements, and it throws an
| error if your transactions don't match up.
| rockooooo wrote:
| I don't use beancount but if you're already in sheets, Tiller
| might be easier to switch to to get lots of built-in
| dashboards/metrics
| CaptainJack wrote:
| I've used beancount extensively, spent many hours a few years
| ago. Built importers parsing bank PDFs (in UK, plaid doesn't
| work. Plus I'd rather also keep all the original statement PDFs).
|
| Probably built 10+ importers, plus some plugins to do automated
| transaction annotations.
|
| I have not made any update for many years now, because: -
| Downloading statements is still a pain, have to manually go
| through all websites. Banks are bad at making the statements
| available, and worse making it possible to automate it. - The
| root of the issue is actually that beancount is too slow. Any
| change/update takes ages. Python is both a blessing (makes it
| easy to add plugins/importers etc), and a curse (way slower than
| some other languages.
|
| I believe the creator of beancount has started working on v3 with
| a mix of C++/python, relying on protobufs, a C++ core for
| parsing, etc. AFAIK, that is not production-ready yet.
| diftraku wrote:
| I'd be really curious on how hard programmatic access to your
| own, personal banking data might be in the PSD2-era.
|
| I can link my secondary bank account to my main bank's app so I
| can see the balance in one place, but the catch is that I need
| to refresh this authorization through the app every 90 days.
|
| Ideally, you'd just use your banking credentials to authorise
| the API access and pull data through that. What this requires
| in practice, I have no idea but it probably involves a bit of
| bureucracy.
| jazzyjackson wrote:
| Ran into this annoyance recently setting up new accounting
| software, that the access my bank provides is last 6 months
| only, so I still had to go and export a csv, rejigger the
| column names and date format, to reimport the first 8 months
| of 2024.
|
| My thought for working around tracking new transactions
| without a third party is to just set up email alerts so I get
| a notification on every charge, deposit etc and set up some
| cron job to read new emails and update my books.
| mtlynch wrote:
| v3 is out now and v2 is officially deprecated:
|
| https://groups.google.com/g/beancount/c/iTdRuvZnE4E
|
| I found the migration pretty confusing and haven't found good
| documentation on how to go from v2 to v3.
|
| The best I've found is this unofficial write-up from an
| experienced Beancount user:
|
| https://sgoel.dev/posts/moving-from-beancount-2x-to-3x/
| CER10TY wrote:
| As far as I can tell this is without the planned C++ rewrite
| though, and the documentation at https://beancount.github.io/
| still says to use v2.
|
| Is there a point in migrating already?
| mtlynch wrote:
| I'm still waiting on better migration instructions.
|
| The maintainer says here that v2 is officially deprecated:
|
| > _You should not use v2 anymore._
|
| https://groups.google.com/g/beancount/c/iTdRuvZnE4E/m/o9V91
| W...
| FredPret wrote:
| I suspect Python isn't the limiting factor here - it's the file
| format. You can end up with huge interconnected text files that
| have to be fully parsed on every change.
|
| If you have 1e5 - 1e6 of lines of transactions, I think a
| SQLite database would be a huge step forward. If you have much
| more than that, you probably need an ERP system.
|
| Of course the text files make it ~easy to enter transactions,
| but maybe there's an elegant way to use those for ingestion
| only; that does make the system much more complicated to use.
| That might not be a problem for the kind of person using plain-
| text accounting over the course of years though.
| chrislloyd wrote:
| I have a very similar setup but with HLedger[1]. A "do-
| nothing"[2] script helps me download statements by opening bank
| websites, waits for manual import and finally checks balances.
| That makes it a lot less repetitive and error prone. Or at
| least, I catch the errors faster.
|
| I've found HLedger and Shake to be fast enough to process
| almost a decade of finances. Dmitry Astapov has an extremely
| well produced tutorial workflow[3].
|
| How have you managed the PDF parsing? Mine has become a bit of
| a mess dealing with slight variations in formatting as they
| change over time. I've been considering using LLMs but have
| been nervous about quality.
|
| [1]: https://hledger.org [2]:
| https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-...
| [3]: https://github.com/adept/full-fledged-hledger
| Karrot_Kream wrote:
| Why not spot check your PDF LLM outputs? I always make sure
| my accounts balance by hand anyway. Though Occasionally it's
| really painful especially if it's a missing Venmo
| transaction. It's rare that I need to really comb through my
| accounts to account for some money but when I do it's really
| time-consuming.
| jxjnskkzxxhx wrote:
| Banks in the UK allow to export transactions in many formats.
| Login, pick time range, download in ofx format. Why is this a
| pain?
| BeetleB wrote:
| Multiple bank accounts and multiple credit cards. Also,
| figuring out the time range for each bank.
| mgr86 wrote:
| I run into the same issues here with banks in the US. It is
| a real pain in the ass, and makes tracking this sort of
| information way more time consuming then it needs to be.
|
| My other issue is with stores like Costco that sell both
| household goods, groceries, clothes, and even misc kids
| stuff. I like to track each separately. Which means I then
| need to fetch and analyze the receipts.
| BeetleB wrote:
| > I like to track each separately. Which means I then
| need to fetch and analyze the receipts.
|
| That is a reality. To make my life easier, when I check
| out at a store, I put all my grocery items first on the
| belt. Then everything else. Usually "everything else" is
| only a few items. So I categorize those additional items,
| and then specify "Groceries" for the rest.
|
| Often I buy _only_ groceries, and I throw those receipts
| away. When I 'm in a ledger/beancount session, if I don't
| have a receipt, that means it was just Groceries.
|
| This method alone really reduced my time dealing with
| receipts.
| cranky908canuck wrote:
| For planning purposes, could you look at a year's
| postings, then come up with "good enough" breakout
| allocations going forward?
| erikerikson wrote:
| It makes it about them not about you. I don't care which
| banks and other financial providers I use. I care about
| managing my funds in a way that is efficient and healthy for
| my life. The banks I use are simply service providers, a
| subclass of service providers across all the dimensions of my
| life. They have regulations they must abide by but in so
| doing they attempt to force me to think and act in those
| terms and I think they're poor.
| jxjnskkzxxhx wrote:
| The fact that I can download my data describing my
| transactions in a format convenient to me, makes it about
| them? Curious take.
| cranky908canuck wrote:
| "banks allowing export of transactions" is only the start.
|
| I deal with two banks for credit cards.
|
| One (call it "Blue Bank") allows me to download a statement.
| I filter out a couple of things (payments mostly), check that
| it matches the paper statement balance, and post it. About 15
| minutes start to finish.
|
| The other (call it "Orange Bank") allows me to download a
| "statement". I filter out a couple of things (payments
| mostly), check my previous month's transactions to see which
| ones at the beginning of the file actually go in the current
| billing period (not already paid), stare at the last
| transactions to see which ones actually were posted to the
| current billing period (not after the cutoff), run the script
| to check the total (nope, doesn't match) then do that a
| couple of times until it matches. The time they changed the
| meaning of the "credit" column from "just confirming this is
| a credit" to "it's a credit, you need to flip the sign" it
| was 45 minutes.
|
| But hey, it's all CSV!
| jxjnskkzxxhx wrote:
| I guess you must have a more complex life than me. I never
| filter out anything, and everything always matches.
| cranky908canuck wrote:
| Maybe. What I was trying to get at was, some banks (the
| 'orange one') don't provide sane semantics, so even if
| the input format is compatible, reconciliation can be a
| nightmare. You may not be dealing with your 'orange
| bank'; if I only dealt with the blue one I would not be
| aware of the problems of the other (and it would not have
| occurred to me that the orange one could botch it up).
| BeetleB wrote:
| > Downloading statements is still a pain, have to manually go
| through all websites.
|
| Have you considered using Playwright?
|
| I used aider[0] recently to log into my work's payslips and
| download all the relevant payslips into JSON format (with
| values encrypted). It took about 3 hours, but that's mostly
| because of my lack of knowledge of good CSS selectors.
| jazzyjackson wrote:
| As for automatically pulling transactions, it's still shocking to
| me that anyone handed over their banking username and password to
| a third party, quicken et al at least use a system now where the
| bank grants read only permission as an app integration, but still
| I'm not a fan of every bank transaction becoming someone else's
| anonymized data harvest.
|
| I'm evaluating whether self hosting accounting software is at all
| feasible to meet CMMC requirements and so far have my sights set
| on ERPNext. I configured my bank to send me an email alert for
| every transaction and intend to parse that and appene it to my
| ledger, but the API to do so is fairly annoying, so hearing that
| beancount is meant to be automatable is intriguing indeed
|
| BTW the other thing that shocks me about quicken classic is lack
| of version control - the database can become corrupt and your
| only recourse is restoring from backup or sending the file to
| support and having them manually fix it!
| evrimoztamur wrote:
| The space was absolutely insane prior to PSD2 in Europe.
| Luckily most banks now have API endpoints, but there's a new
| class of middlemen wrangling the APIS: the aggregators who pipe
| all your data from the banks to apps. It's better than handing
| out the keys to the old class of scraping aggregators, I
| suppose, but the outcome is still not _that much_ better in my
| opinion.
| FredPret wrote:
| I automated the flow of transaction -> text message -> MacOS ->
| Beancount once, but it never added up.
|
| - some transactions go through without a text
|
| - some transactions generate two identical texts
|
| - for Forex transactions, the amount in the text and the amount
| on your statement will not match
|
| - some texts are ambiguous in that they could have been
| generated by two different kinds of transaction; especially
| true of deposits & credit card payments
|
| In the end I gave up, accepted that accounting will never be
| smooth & simple, and now just generate a CSV every month by
| hand.
| Wronnay wrote:
| Do you plan to open source or sell your Email to ledger /
| ERPNext solution? I actually plan the exact same thing for my
| use cases...
| Helmut10001 wrote:
| If you are in Europe, there is a very nice OSS software called
| Hibiscus [1] with APIs-connectivity for most banks, and (local)
| web scrapers for those banks that have no APIs.
|
| I am working on a plugin to pull categories and transactions
| from the Hibiscus DB (H2 or PG) to Beancount [2]. It is not
| there yet, but the process overall looks promising.
|
| [1]: https://www.willuhn.de/
|
| [2]: https://github.com/Sieboldianus/beancount-hibiscus-
| importer
| amaccuish wrote:
| Another Hibiscus fan.
|
| I'm pretty sure FinTS/HBCI is a mostly only used here in
| Germany sadly. Which is a shame because all this
| "OpenBanking" stuff requires registering/paying/certificates
| etc.
| human_llm wrote:
| ERPNext is fairly easy to automate. We have a number of Python
| scripts that use the ERPNext API to create sales invoices, add
| and reconcile transactions from PayPal, Stripe, etc.
|
| We originally used Quickbooks Online and I'm glad we decided to
| switch to ERPNext a few years ago.
| fangpenlin wrote:
| Hi, the author here.
|
| If you are okay with Plaid[1], many of their bank connections
| are now using OAuth-style authentication instead of password
| sharing. I actually added a new feature called Direct
| Connect[2] a while back to allow any plaintext accounting book
| users to pull CSV directly via Plaid API through BeanHub. We
| don't train AI models with our customers' transactions, and if
| we want to, we will ask for explicit consent (not just ToS) and
| anonymize customers' data.
|
| If you're okay with the above, the key to achieving a high
| automation level is the ability to pull CSV transaction files
| directly from the bank in a standard format. Maybe you can give
| it a try. We have 30 days free trial period.
|
| I am not so familiar with the CMMC requirements, as you
| mentioned, but for us to access transactions from some banks,
| such as Chase, Plaid requires us to pass an auditing process
| about our security measurements. Is the CMMC compliance your
| company needs to meet to take a third-party software vendor
| into considerations?
|
| [1]: https://plaid.com
|
| [2]: https://beanhub.io/blog/2025/01/16/direct-connect-
| repository...
| ashish01 wrote:
| With a one-time setup, a low-fi solution is to receive an email
| from your bank for every transaction and extract the data into
| a ledger entry.
| walterbell wrote:
| _> not a fan of every bank transaction becoming someone else 's
| anonymized data harvest_
|
| In 2024, CFPB had mandated all U.S. banks to open up data
| history to SaaS/fintech vendors, if the customer gives
| permission.
| skwee357 wrote:
| That's interesting. I, however, still spend between 20 and 30
| minutes once a week to keep by beancount updated by manually
| entering the transactions, a habit I perform for the past 14 or
| so years.
|
| At the same time, I did automate 90% of my business beancount
| import by writing a custom stripe importer that imports
| transactions from stripe. As for the expenses, I still enter them
| manually in the aforementioned 20-30 minute session.
| BeetleB wrote:
| > I, however, still spend between 20 and 30 minutes once a week
| to keep by beancount updated by manually entering the
| transactions
|
| That sounds a lot - I spend less.
|
| Consider entering the transactions into a software like
| KMyMoney and write a script to export to beancount format.
| Entering/importing is a lot easier in a SW like KMyMoney (e.g.
| it does decent matching of the new transaction to prior
| transactions of similar amounts).
| jldugger wrote:
| Honestly, there's a sort of Jevons paradox at work. Importing
| credit card transactions isn't that hard, but now that it's
| solved, I should really be monitoring my investment portfolio
| more closely, or tracking intangible assets like unused
| vacation hours, or recording all those taxes and expenses
| listed on my paycheck.
| zefhous wrote:
| Love this! I have been using hledger for a while now and have a
| pretty automated process for importing exported CSVs. I would
| love a little more automation in terms of pulling down the data,
| but on the bright side the manual process provides a good touch
| point to keep up on accounting regularly in small doses. This is
| great for just keeping an eye on things on a monthly basis.
|
| I am starting a new business now and intend to see how far I can
| take plain text accounting in that context. I plan to use mercury
| for banking and want to automate as much as possible. I would
| also like to associate invoices that are stored in a self-hosted
| paperless-ngx instance.
| abhiyerra wrote:
| I wrote a script to download and categorize my Mercury
| Transactions and it is quite straightforward. Highly recommend
| it.
|
| I am looking to deprecate my Quickbooks usage after this year
| since it is such a pain to split payments into multiple chunks
| automatically and I don't really know what I am getting for
| $60/month.
| joshstrange wrote:
| I use YNAB and love it but I'm always interested in alternatives.
| That said I opened the main website [0] and the animation [1]
| shows a bunch of crypto logos. To my eye this seems to be a
| product aiming itself at people using crypto. I don't think
| that's the case given the blog post I just read but it's not a
| good look. Anyone catering to the crypto market is at least a
| little suspect in my book. Maybe it's just a feature of BeanHub
| and you don't have to touch it at all but to be featured so
| prominently is icky/
|
| [0] https://beanhub.io/
|
| [1] https://cs.joshstrange.com/5njjtGND
| ndegruchy wrote:
| Yeah, the site and service seem to be trying to attract a range
| of people. However, `beancount` and the other `ledger-cli` like
| tools are all just plain-text files with a semi-rigid markup
| for recording transactions.
|
| https://plaintextaccounting.org/ is the resource for most of
| them, and has good resources for making them work for you. It's
| not for everyone, though, many people just prefer spreadsheets
| or apps, and that's fine.
| shortrounddev2 wrote:
| YNAB seemed WAY too involved for me. I spent so much time in
| the fucking app and barely understood it. I had to "give each
| dollar a job" which is a total inversion of the traditional
| "set a dollar amount you want to spend on each category and
| then exercise some self control". I pay for everything on a
| credit card and then pay back that CC at the end of the month,
| and doing so complicated the UI when it automatically pulled in
| my spending from my bank account. I've found GNU Cash to be a
| bit more intuitive with a little bit of training.
|
| I feel that there are some people who legitimately enjoy
| looking at money on spreadsheets and implementing
| budgets/categorizing purchases and I think that YNAB is great
| for those people, but I personally hate even THINKING about
| money, let alone interfacing with budgeting software every
| couple days
| jimbokun wrote:
| > "set a dollar amount you want to spend on each category and
| then exercise some self control"
|
| How is that different from "give each dollar a job"? The only
| difference I see is that it forces you to make the categories
| add up to the amount of your paycheck.
| shortrounddev2 wrote:
| Yes, that's the difference. When I asked on help forums
| questions like "Can I just set the budget with the
| assumption that how much money I make this month will be
| the same as next month", they said no, because I don't know
| what will happen to my paycheck this month. I need to divy
| up real dollars, and not expected dollars, meaning that the
| budget is a constantly moving target and I need to readjust
| and give a job to my _actual_ dollars _every two weeks_
| instead of just my expected paycheck.
| jimbokun wrote:
| Understood.
|
| I copy over all the previous months budget amounts and
| tweak it to match the current paycheck, if it's
| different. And frankly don't care too much if it's a few
| dollars off.
|
| I do this in a spreadsheet. I had written an app for
| budgeting but it was too much hassle keeping it up to
| date. New versions of Mac OS would break it in subtle
| ways and wasn't worth the effort of all the bug fixes.
| joshstrange wrote:
| People should use what works best for them but I'd like to
| respond a bit to the YNAB issues you ran into (not to
| convince you to switch or anything).
|
| YNAB with Credit Cards was difficult for me, as was envelope-
| based budgeting (what "give every dollar a job" is called)
| because I also was used to the typical "set limits on
| categories and stay within them"-style budgeting (Like Mint,
| at least Mint way back when it first came out, I haven't
| touched it in years). YNAB is very different in that it
| doesn't let you allocate dollars that are not in your bank
| account. You can't say "$300 for eating out" unless you have
| $300 in your bank account and YNAB doesn't care that you
| might have that money available by the time you want to spend
| it, it forces you to allocate the money you actually have and
| every time you get paid you allocate it into categories with
| the long-term goal of getting a month (or months) ahead in
| your budgeting (not spending the money you made this month on
| stuff you need in this month).
|
| Credit cards were also hard to wrap my head around. In a
| debit-only world it all made sense but CC's complicated
| things for me. I really enjoyed Nick True's videos on
| Youtube, they helped me with this a lot but a simplified way
| to think about this is:
|
| * You put $200 in your "groceries" category (aka envelope)
|
| * You go to the store and spend $60 (on eggs I assume?) and
| pay with your CC
|
| * In YNAB-land you will record that transaction (or it will
| be auto-imported) and you will assign it to the groceries
| category but since YNAB knows you spent this on a CC (you
| always record which account the transaction happened on) it
| essentially takes $60 out of the "Groceries" envelope and
| moves it to the "Chase Sapphire" (or whatever you name it)
| envelope. You set aside the money for your CC purchases at
| time of purchase and then when the CC bill comes due it's
| paid out of that "envelope".
|
| In this way YNAB has become a layer on top of all my finances
| and I care little about how much money is in any given
| savings/checking account since YNAB tracks everything. I just
| make sure there is enough to cover CC payments (there always
| is) or any big transfers I want to make (like moving money to
| a HYSA).
|
| I've automated as much as I can with YNAB but I do spend
| 30min-1hr every few weeks (this is not what they recommend,
| but it works for me) reconciling my accounts. I totally get
| if people don't want to do that or don't see the value in it.
| Personally I love knowing where every dollar of mine is and
| tracking every purchase/transfer.
| fangpenlin wrote:
| Hi, the author here.
|
| So BeanHub is built on top of Beancount and uses double-entry
| accounting. It's one of the benefits of double-entry
| accounting. Many accounting software are not good at dealing
| with multi-currencies or custom currency. With Beancount, you
| can define any commodity you want, create transactions, and
| convert them with different currencies easily. For example, you
| can define a commodity TSM and create transactions[1] like
| this:
|
| 2025-01-01 commodity TSM
|
| 2025-03-05 * "Purchase TSMC"
| Assets:US:Bank:WellsFargo:Checking
| -2,000 USD @ 100 TSM Assets:US:Bank:Robinhood
| 20 TSM
|
| I think many people trade crypto, and traditional accounting
| software may not be that friendly to them. That's why I
| emphasized a bit to the crypto target audience. But you're
| right; I should make it clearer that it's not just for crypto.
|
| [1]:
| https://beancount.github.io/docs/beancount_language_syntax.h...
| egglemonsoup wrote:
| Great article! A minor correction, Steph Ango is Obsidian's CEO,
| not founder
| fangpenlin wrote:
| Hey! Thanks for pointing out. I have already corrected it in my
| article :)
| idopmstuff wrote:
| I own a couple of small businesses, and I've tried a few things
| with my books - was on Bench for a year (thankfully not the year
| they shut down, and they were so incompetent I dropped them
| before that), tried to do them myself for a year, then upon
| realizing the P&L generated did not match the numbers I expected,
| hired bookkeepers on Upwork to fix them for me.
|
| I really feel like I ought to be able to do them myself - it's
| just following some rules, and my accounts aren't that complex.
| Still, it was just enough of a pain that it was easier to hire
| someone overseas for cheap (especially since I know what the
| business' numbers should roughly come out to, so I can validate
| their work).
|
| But as I've been using the latest AI models, I really feel like
| this is something that's going to be fully automated by AI in the
| next 1-2 years (at least for my very simple use case). It's
| simple enough that an AI agent should pretty trivially be able to
| fetch the docs from the various places that I sell upload them,
| categorize transactions (this is already basically automated by
| rules for me anyway) and then do whatever it is that bookkeepers
| do to close the books.
|
| I can't help but think that bookkeeper is not going to be a
| profession in five years, and I'm just not sure where those
| people go. It's not like automating bookkeeping will expand the
| economic pie enough to create new jobs - I don't believe it's a
| bottleneck to anything at this point.
| fangpenlin wrote:
| Hi, the author here.
|
| Many customers have asked me about AI offerings, and I am
| considering them. While this is doable with modern LLM
| technologies, I need to consider many issues.
|
| The first is that nobody, myself included, likes their data
| being part of someone else's machine-learning training
| pipeline. That's why I promised my users that I wouldn't use
| their data for machine learning training without asking for
| explicit consent (and, of course, anonymization will be
| needed).
|
| While I know everything involved in AI sounds cool, do we
| really need LLM for a task like this? Maybe a rule-based import
| engine could kill 95% of the repeating transactions? And that's
| why I built beanhub-import[1] in the first place. Then, here
| comes another question: Should I make LLM generate the rule for
| you or generate the final transactions directly?
|
| Yet another question is, everybody/every company's book is
| different from one to another. Even if you can train a big
| model to deal with the most common approaches, the outcome may
| not be what you really need. So, I am thinking about possibly
| using your own Git history as a source of training data to
| teach machine learning models to generate transactions like you
| would do. That would be yet another interesting blog post, I
| guess if I actually built a prototype or really made it a
| feature for BeanHub. But for now, it's still an idea.
|
| [1]: https://beanhub-import-docs.beanhub.io/
| asadjb wrote:
| Love this! I haven't used Beanhup, but was a user of text based
| accounting systems (beancount, hledger) for many years to track
| personal expenses. I stopped doing it when I realized I wasn't
| getting much out of it, but the knowledge of double-entry
| accounting still helps me to this day when I keep track of my
| business expenses in Xero.
|
| One thing which I disagree with in this article is the focus on
| file based data storage:
|
| > That makes it 10 times harder because you need to parse the
| text file, update it accordingly, and write it back. But I am
| glad I did. That guarantees all my accounting books are in the
| same open format.
|
| This quote captures my issues with it. It just makes things so
| much more difficult; and it makes the whole process slower as the
| file grows. I remember that when I used hledger for tracking my
| expenses over 3 years, I had to "close books" once a year and
| consolidate all the transactions for the past year into 1 entry
| in a new ledger file to keep entry/query operations fast.
|
| I get the sentiment; you want open data formats that remain even
| after your app is shutdown. But you can get the same by using
| open formats; maybe a sqlite DB is good enough for that?
|
| The only thing that would be more complicated with a DB is
| versioning & reviewing commits like this app does; which does
| seem like a very exciting feature.
| packetlost wrote:
| I'm not at all familiar with text-based accounting tools, but I
| feel like the performance issues could be addressed by using
| multiple files instead of one big one.
| jimbokun wrote:
| Then you are slowly building a database engine.
|
| When do you split the files? How do you track which data
| resides in which files? Does one file represent one kind of
| data (table)? Does it reflect data within a given time range?
| Do you sometimes need to retrieve data that crosses file
| boundaries?
|
| You quickly lose the simplicity inherent in saving to just a
| single file.
|
| Which is where Sqlite shines. It's a single file. But with a
| full, user defined schema. And can update it and query it
| incrementally, without having to read and write the entire
| thing frequently. It handles all of that complexity for you.
| fangpenlin wrote:
| Some people, myself included, prefer text-based files as a
| user interface. Like, some Vim users won't leave their Vim
| session forever and would like to do everything in it.
| While SQLite is immortal software and will probably be
| there forever, using it means changing the UI/UX from text
| files to SQL queries or other CLI/UI operations. I think
| it's a preference for UI/UX style instead of a technical
| decision. For that preference of UI/UX, we can push on the
| technical end to solve some challenges.
| packetlost wrote:
| > Then you are slowly building a database engine.
|
| > When do you split the files? How do you track which data
| resides in which files? Does one file represent one kind of
| data (table)? Does it reflect data within a given time
| range? Do you sometimes need to retrieve data that crosses
| file boundaries?
|
| Not really. Splitting anywhere from per-day to per-year is
| probably fine! Or split arbitrarily and merge the files at
| runtime. Make it configurable! Add tools to split or merge
| files, it's really _not_ that hard, a far cry from a
| database engine.
|
| > You quickly lose the simplicity inherent in saving to
| just a single file.
|
| No, you really don't.
|
| > Which is where Sqlite shines. It's a single file. But
| with a full, user defined schema. And can update it and
| query it incrementally, without having to read and write
| the entire thing frequently. It handles all of that
| complexity for you.
|
| That you need a particular tool or library to interact
| with.
|
| I'm not going to try and sell you on the benefits of using
| plaintext tools because you've already clearly made up your
| mind, but there are reasons even if you can't see them.
| SQLite has like 160k lines of code complexity that isn't
| necessary and is inherently less composable.
| fangpenlin wrote:
| Hi, the author here.
|
| I get where you're coming from. My books are also growing big
| right now, and indeed, they have become slower to process. Some
| projects in the community, such as Beanpost [1], are actually
| trying to solve the problem, as you said, by using an RMDB
| instead of plaintext.
|
| But I still like text file format more for many reasons. The
| first would be the hot topic, which is about LLM friendliness.
| While I am still thinking about using AI to make the process
| even easier, with text-based accounting books, it's much easier
| to let AI process them and generate data for you.
|
| Another reason is accessibility. Text-based accounting only
| requires an editor plus the CLI command line. Surely, you can
| build a friendly UI for SQLite-based books, but then so can
| text-based accounting books.
|
| Yet another reason is, as you said, Git or VCS (Version control
| system) friendliness. With text-based, you can easily track all
| the changes from commit to commit for free and see what's
| changed. So, if I make a mistake in the book and I want to know
| when I made the mistake and how many years I need to go back
| and revise my reports, I can easily do that with Git.
|
| Performance is a solvable technical challenge. We can break
| down the textbooks into smaller files and have a smart cache
| system to avoid parsing the same file repeatedly. Currently, I
| don't have the bandwidth to dig this rabbit hole, but I already
| have many ideas about how to improve performance when the file
| grows really big.
|
| [1]: https://github.com/gerdemb/beanpost
| asadjb wrote:
| Thanks for responding and your thoughts! Generally agreed
| with all you said.
|
| However, I feel like maybe a different approach could be to
| store all the app state in the DB, and then export to this
| text only format when needed; like when interacting with LLMs
| or when someone wants an export of their data.
|
| Breaking the file into smaller blocks would necessarily need
| a cache system I guess, and then maybe you're implementing
| your own DB engine in the cache because you still want all
| the same functions of being able to query older records.
|
| There's no easy answer I guess, just different solutions with
| different tradeoffs.
|
| But what you've built is very cool! If I was still doing text
| based accounting I would have loved this.
| bks wrote:
| Is there any solution for statement downloading, like many small
| businesses I have credit card statements, bank accounts that I
| need to provide to my accountant.
|
| While my "books" are synced via Quickbooks, they (accountants)
| really seem to love having the PDF in hand. I just need the PDFs
| and they do not send them via email...
| hamiltont wrote:
| Hi - I'm working on something like this because I needed it too
| ;-)
|
| It's not yet ready for release, but I should be ready for beta-
| test within 2 months. If you're interested I would be happy to
| add you to my list of "people to notify when I am ready to beta
| test"
| bks wrote:
| yes please
| bzmrgonz wrote:
| would using sqlite have been inline with the file-over-app
| philosophy?? I'm thinking yes.
| fangpenlin wrote:
| If there's anything like immortal software, SQLite is
| definitely on the list
| conradev wrote:
| Language models completely upended my Beancount setup. To me,
| there is no point fiddling with precise parsers when a language
| model can read any PDF. I have the language model extract balance
| assertions from the PDF (beginning/end balance) so that it
| grounds its work in reality.
|
| I dream of a future where anyone can download a ZIP of all the
| PDFs they've ever received from their bank, drop it onto a local
| app, and wait while it creates an entire accounting setup for
| you.
|
| Edit: also not mentioned here is Fava, which is a really nice web
| UI for Beancount (https://beancount.github.io/fava/). I share a
| link with my accountants, and they find it convenient (for
| downloading files, at least).
| xyst wrote:
| Lovely, now LLM can hallucinate how much I spent or earned in a
| month, or year.
| fangpenlin wrote:
| For a usecase like this, a local running model would be
| ideal. I won't like to share my personal accounting books
| with LLM either.
| conradev wrote:
| It makes it a lot harder if you check the balances!
| daft_pink wrote:
| So can this be used for business books and records? A lot of the
| documentation I've shifted through since Intuit sh*ttified their
| product made me think that this the ledger open source accounting
| is more of a quicken like personal finance product.
| zie wrote:
| double entry book keeping can be used to track any resource of
| any size, though it's particularly good at tracking money of
| any size.
|
| The upside of a "real database" like postgres is when you need
| multiple people in the books making changes at the same time.
| Until your accounting department grows into multiple people,
| Beancount, hledger or any plain text accounting system would be
| fine.
| simonmic wrote:
| If those multiple people are updating the files via VCS, or
| via UIs that enforce append-only updates, it's still
| reasonably fine.
| zdw wrote:
| A long time ago I wrote a python extension to the C ledger
| implementation that did a basic RPN calculator:
|
| https://github.com/zdw/ledgercalc
|
| And fed it with pile of scripts that extracted bank PDFs -> Text
| -> ledger entries, and shoved it all in git.
|
| This looks like it some superset of that, but in general I found
| the files + text to ledger process to scratch a great itch.
| gen220 wrote:
| FWIW I went down the path of automating as much as I could about
| this process.
|
| These days, though, my process is more manual. Around every 24 or
| 48 hours, or immediately after making a transaction, I'll record
| the transaction in my ledger, which I store in Google Sheets (!)
| instead of a .ldg file. No more CSVs, no more pure functions of
| bank output.
|
| Sometimes I miss the .ldg format. But I don't really miss
| maintaining the automated system. Google Sheets isn't as
| expressive as Ledger, but it is sufficient for my needs. Charting
| is a bit easier. YMMV!
|
| To me, the most essential pivots to get me back into personal
| accounting were to record expenses both manually and ~daily. If I
| were to return to ledger again -- which I might! -- I'd focus on
| those aspects more than the automation.
___________________________________________________________________
(page generated 2025-03-05 23:00 UTC)