[HN Gopher] Advent of Code 2024 in pure SQL
___________________________________________________________________
Advent of Code 2024 in pure SQL
Author : greghn
Score : 451 points
Date : 2025-01-02 19:20 UTC (1 days ago)
(HTM) web link (databasearchitects.blogspot.com)
(TXT) w3m dump (databasearchitects.blogspot.com)
| mbowcut2 wrote:
| Absolutely horrifying. Well done.
| teeray wrote:
| "Thanks, I hate it"
| sceadu wrote:
| for those who don't know, the author is one of the best DB
| researchers in the world
| beoberha wrote:
| Thomas Neumann doing Thomas Neumann things
| scop wrote:
| I reacted to this title the way I react a new menu item at Taco
| Bell: a strange mixture of desire, shame, and admiration for
| human ingenuity.
| qsort wrote:
| I work a lot with databases and I've seen... stuff. It's not as
| bad as you might think if you know what you are doing. Most
| RDBMSs support recursive CTEs, it feels like writing Prolog
| with a slightly sadistic syntax. For something like AoC the
| most difficult part is probably parsing the input.
| dunham wrote:
| Speaking of parsing, back around y2k we were building an app
| that used XML everywhere, which was the style at the time,
| and our DBA wanted to write an xml parser in SQL (the api
| would involve sending XML to the database). That got vetoed.
|
| IMO, this kind of thing is what AoC is good for - you get to
| play with weird/obscure stuff without affecting your day job
| code.
| jamie_ca wrote:
| I did something with JSON back before there was reasonable
| native support - it's certainly not robust, but it handled
| a few syntax variants for a use case where we had an extra
| attribute column that serialized JSON, and wanted to
| surface one of the fields as a proper column on the table.
|
| https://blog.tracefunc.com/2011/11/19/parsing-json-in-sql-
| ht...
| jmg_ wrote:
| Funnily, I'm actively working on rewriting a stored
| procedure which parses an XML snippet as one of its
| arguments
|
| Luckily it's not a handwritten XML parser though:
| https://learn.microsoft.com/en-
| us/sql/t-sql/functions/openxm...
| yazmeya wrote:
| Just around the same time I was working at a place that
| used Oracle's web app extension, with CGI endpoints written
| completely in PL/SQL. I did end up writing an XML
| parser/serializer for it.
| emmelaich wrote:
| Thanks for that comment.
|
| I laughed aloud at _" It's not as bad as you might think if
| you know what you are doing."_
|
| ... because that pretty much describes all human activity :-)
| AdieuToLogic wrote:
| > Most RDBMSs support recursive CTEs, it feels like writing
| Prolog with a slightly sadistic syntax.
|
| Which makes sense as both are declarative logic-based
| languages. IMHO, SQL and Prolog fundamentally have much in
| common.
| marxisttemp wrote:
| I did a semester at the university of Edinburgh and took
| database systems and logic programming at the same time,
| and I definitely felt the synergy between them.
| UltraSane wrote:
| A company I worked for uses Syteline ERP which heavily relies
| on SQL Server. But the DBA was constantly complaining about
| how slow the Syteline SQL was. One major issue was long
| running transactions taking 10 minutes locking rows/tables
| for too long and using a lot of memory. You would think very
| expensive ERP systems would have decent SQL.
| ulrikrasmussen wrote:
| It is closer to Datalog I think, or can you express cut? CTEs
| are fairly restricted compared to logic programmming
| languages though, at least for Postgres. In particular,
| relations cannot be mutually recursive and your rules may
| only be linearly recursive in themselves (i.e. can contain
| only one instance of themselves in the right hand side).
| Postgres is overly restrictive in the latter and requires at
| most once recursive reference over all subqueries in the
| UNION even though it would be safe to only restrict the
| number of recursive calls for each subquery (each
| corresponding to a separate Datalog rule for the same
| relation). It is possible to work around that restriction
| using a local WITH expression (a hack really), but then you
| are also on your own since it disables all checks and allows
| you to write rules which uses actual nonlinear recursion and
| will give incorrect result sets when evaluated.
|
| I really would like Postgres to have proper support for
| writing Datalog queries, and with better and more efficient
| incremental evaluation algorithms as opposed to the iterative
| semi-naive algorithm that is only supported now.
| harrall wrote:
| Haven't written SQL in a while (and I used to write a lot)
| but I think SQL Server recursive CTEs are fairly unbounded
| so it's just a Postgres limitation unfortunately.
|
| (I'm a fan of MS SQL but it's Microsoft and also hard to
| financially justify for many companies. But if you ever get
| to use it, it is a very solid RDBMS, even if the rest of
| your stack is open source.)
| bob1029 wrote:
| The cost of MSSQL is largely controlled by how the system
| is designed and the complexity of the business.
|
| The model I am most familiar with is a 10-20 employee B2B
| SaaS startup running one "big" instance on a single vm in
| the cloud somewhere. If this is approximately all you
| require, then the cost should not be a dominating factor
| in your decision.
|
| I think "because Microsoft" is also really poor
| justification if we are being serious about the
| technological capabilities and pursuing high quality
| business outcomes. If your business is fundamentally
| about open source advocacy and you are operating as a non
| profit, I totally get it. But, this is probably not your
| business model.
| larodi wrote:
| MS SQL is one fascinating piece of software and much
| closer to commercial big bro offering such as Oracle and
| DB2, yet much more user friendly and convenient.
|
| This sentiment against it really always comes only from
| people who have not used it or have never touched the
| enterprise version which is a very mature ecosystem with
| lotta features available for ages now.
|
| Most of my career I've been dealing with DBs and MSSql is
| the easiest to admin perhaps being also tightly
| integrated with all scripting in the platform. It also
| runs Linux and is doing it better than the rest can say
| for running Windows.
| youerbt wrote:
| I do AoC in SQL, I wish it was true. With Postgres, you have
| lots of regex/string manipulation functions that make it
| easy.
|
| For me, the biggest problem was memory. Recursive CTEs are
| meant to generate tables, so if you are doing some maze
| traversal, you have to keep every step in memory, until you
| are done.
| xdavidliu wrote:
| parsing is most difficult for probably the first third of the
| problems. when you get to day 19 or so, the input is still
| just a grid or a bunch of ints just like day 1, but the
| algorithms required are considerably more challenging than
| the parsing part. (I've done all 25 problems in all years)
| brettgriffin wrote:
| I'm as equally amazed by the solutions in this post's github
| repo as I am with Taco Bell's new chicken nuggets.
| upghost wrote:
| [?] *V[?]*V [?]
|
| Suspicious. Need to investigate if taco bell has interesting
| ANSI SQL flavored chicken nuggets or I've been taken for a
| fool!
| larodi wrote:
| but why? what would make you react at human ingenuity with
| shame and desire? is this something about you or something
| about them in particular? isnt the whole of HN about human
| ingenuity...?
|
| are we to feel Taco Bell menu about it all, what am I missing?
| knicholes wrote:
| This is just a guess, but if the OP's reason is similar to
| mine, DBMSs should be reserved for managing databases and not
| implementing complex logic.
| liontwist wrote:
| Why not? I think the main barrier is the programmers are
| not comfortable with recursion, rather than a technical
| limitation.
| FridgeSeal wrote:
| I'd hazard a guess that quite a few devs have at one
| point, been lumped with a db where someone pushed too
| much logic into it. Probably great for the one person who
| wrote it, and awful for everyone else.
|
| Mystery triggers, unclear or badly managed stored
| procedures, opaque invariants. It's not to say that all
| of these things are bad, but a certain level of DB
| complexity will naturally set off alarm bells for some
| devs.
| scop wrote:
| One of the most eye opening moments in my junior dev
| career was when I found a _really_ nifty way to achieve
| an operation using SQL alone and brought it to our VP
| (start up, small company size). It wasn't esoteric, but
| used some SQL features not well-trodden by web devs. I
| thought I was the bees knees and this would be a clear
| demonstration of my sizable brain. He complimented the
| thought behind it, but immediately vetoed it as too
| complex for the majority of developers who would have to
| touch the code down the road. In hindsight he was
| completely right.
| liontwist wrote:
| I don't think this is a win. Cautioning against adding
| new technology to learn and maintain is one thing. But
| not using an effective solution in your toolset because
| it's too hard is absurd.
|
| Our Industry pretends solutions must be immediately
| understandable with common sense, without study or
| explanation. But nothing that actually matters does this
| (relational databases in the first place).
|
| No other engineering discipline has this confusion. The
| world has complexity and it's our job to understand it.
| liontwist wrote:
| SQL tends to be very dense (more thinking than typing).
|
| It is tedious to specify all that information (access,
| constraints, foreign key, cascading, views), but it's all
| essential information you would end up specifying in
| another way.
| xarope wrote:
| In the past (why do I feel so old when I say this?), DBMSs
| WERE used to implement complex logic. Not just complex
| business logic, but even authentication, authorization, etc
| were implement with stored procs, embedded sql, native DB
| features.
|
| The cycle in tech will shift from one side to another; last
| decade was a lot of no-sql where all this logic and
| implementation had to reside on the application side. I'm
| seeing a shift back to DBs again (e.g. supabase), and the
| reality is that it's probably a continuum somewhere in-
| between, depending on your requirements (as always, the
| answer is "Yes, but it depends").
|
| Remember, these are all "just" tools.
| harrall wrote:
| SQL is like regex.
|
| A small fraction of people bothered to learn how they work
| and know how and when to use them properly.
|
| Everyone else just perpetuates a giant in-joke of "it's
| black magic!"
| lelanthran wrote:
| I do line of business apps right now.
|
| It seems to me that we have the abstraction inverted when
| it comes to ORMs, which is why so many experienced devs
| dont like ORMs (but can't really articulate why).
|
| Here's my take, in the context of business logic:
|
| The schema represents the business rules. The ORM papers
| over the DB so that the code doesn't have to match the
| business rules. Then the dev implements those rules _again_
| in code, but poorly.
|
| If you simply skip the programming language using something
| like Postgrest you end up with a declarative structure of
| the business rules.
|
| The first problem is that most developers would rather hunt
| a bug or perform a modification on a 30k Java or C# or
| python or ruby program than in 5k of SQL.
|
| The next problem is that tooling for SQL is crap. It's
| declarative so good luck stepping through the logic. The
| dialects are all different so editors aren't great at
| spotting errors.
|
| The last major problem is that changing it is hard - you
| can't simply change the schema and redeploy because
| migrations have to be made on the existing data.
|
| All of these are surmountable in some ways, but they
| certainly ain't easy.
|
| Doing LoB apps have opened my eyes a lot: maybe 99% of
| logic is shorter when expressed in SQL. Of course, that
| means that the only use of an ORM in this case is
| connecting to the DB, sending the query, and sending back
| results.
| CommanderData wrote:
| I'm angry but in a good way.
| SoftTalker wrote:
| Over my career I've certainly written more SQL than any other
| type of code. Not so much in the last five years so I'm sure I've
| lost some of it, but I used to really enjoy it. Once you stop
| thinking iteratively and start thinking in set operations it
| becomes quite natural and powerful.
| fifilura wrote:
| I second all of that!
|
| I wish more people would see the beauty. After a session of
| SQL, when I take a step back and think.
|
| "Hold on. What I have been doing lately is just pure logic. No
| library dependency resolution, no concurrency problems (even
| though massive concurrency is certainly under the hood). No
| mutability issues. Just logic."
|
| SQL obviously has its warts, some of them serious, like
| testability. But at the end of the day, I wish all programming
| was like that. Let the computer decide how to do stuff under
| the hood. And let the human focus on the logic.
|
| I have somewhat half-assed tried to read up on Prolog for
| trying to take it to the next level, but failed sofar
| unfortunately. (It was also a goal to try to unlearn some SQL
| to avoid getting stuck in some local optimum). Maybe somewhere
| between SQL and Prolog is the future of programming.
| Rendello wrote:
| > Maybe somewhere between SQL and Prolog is the future of
| programming.
|
| Must be Datalog then ;)
| coliveira wrote:
| Prolog is very powerful, if you see what professionals can do
| with it it's eye opening. Unfortunately, it takes a complete
| relearning of programming to achieve that proficiency level.
| And after you reach it, you probably cannot use it in your
| day job...
| arkh wrote:
| My problem with Prolog is every time I want to start using
| it I feel like I'm populating a database then doing queries
| on it. So it feels like SQL with another syntax and less
| power.
|
| I'm sure I'm wrong and missing something but that's where I
| stop.
| diggan wrote:
| > I feel like I'm populating a database then doing
| queries on it. So it feels like SQL
|
| I'm not sure what database technology/data storage that
| doesn't involve doing those two things to get started.
|
| I haven't done any Prolog, but Datalog which is similar
| and focused on querying data, and the benefits of Datalog
| for me is that you can write complex/medium-complex
| queries a lot easier than in SQL, at least for me.
| Simpler queries are just less characters, but pretty much
| the same beyond that.
|
| Probably helps that the Clojure ecosystem embraced
| Datalog so it mostly feels like writing Clojure code
| although it's really Datalog. You don't get that same
| feeling regardless of what SQL library/DSL you use.
| coliveira wrote:
| Prolog is in fact a kind of database. However it is
| untrue that it has less power than SQL, just the
| opposite. Probably it feels harder to use because you're
| already so used to how SQL works.
| larodi wrote:
| > Maybe somewhere between SQL and Prolog is the future of
| programming.
|
| it was, it most probably is
| bob1029 wrote:
| Over the years I have been pushing more and more
| responsibilities into the RDBMS. I now see things mostly in
| terms of ETL, SQL and schema. Virtually every conversation I've
| ever had about the application of technology to the business
| could be expressed in these terms. Business logic defined as
| SQL queries can be quite intuitive when the schema is
| structured well and aligned to the business stakeholders'
| perspectives.
|
| Code, frameworks, ORMs, "best practices", patterns, et. al. are
| ultimately a distraction. There are a million ways to get the
| data in & out of the database. Moving the bits around is _not_
| valuable. There are many overblown software solutions out there
| that could have been a simple merge statement or CSV import
| job.
|
| I think that a lot of the misconceptions and bad moods about
| SQL come out of being forced to work with nasty schemas. The
| language itself is really domain specific. Consider that one
| wouldn't complain as much about a super fucked up nested query
| (and resulting SQL syntax woes) if it wasn't necessary to write
| such a query in the first place. Aligning tuples and relations
| to the way the business typically talks about them means you
| will be less likely to be fighting these things over time.
| Often, it isn't possible to refactor the schema from zero, but
| you can put replicas/views _around_ a "bad" schema and target
| it with your new development & refactors.
| fifilura wrote:
| > Consider that one wouldn't complain as much about a super
| fucked up nested query (and resulting SQL syntax woes) if it
| wasn't necessary to write such a query in the first place.
|
| And in "modern" SQL this is solved with CTEs. Use them to
| unwrap the schema the way you want it first, before
| implementing the business logic.
| larodi wrote:
| this is poor man's SQL to unwrap for business logic. SQL is
| such more more about everything else and so litte about the
| trouble with mapping business logic into storage.
| bbkane wrote:
| Could you please add some examples? I'm still living in
| poor man's SQL land
| solumunus wrote:
| Please elaborate.
| larodi wrote:
| CTEs can indeed align mismatched and denormalised storage
| to some E/R which follows some business logic. That much
| you can say. But CTEs allow for recursion and save time
| when you need complexity but you want to follow a precise
| step-by-step reasoning.
|
| This dis not arise as a need to solve business mess, but
| as a need to skip DDL for views/mviews.
|
| You can have arbitrary dimensions sliced in CTEs which
| does not immediately imply a goal of business schema
| alignment.
|
| Besides the top SQL devs I've met don't lose time to
| align schema at all, but write the DB table names and
| columns as they are, because well... because they were
| usually the people who created the mess this way.
|
| Many reports such as month-to-month increase of sales or
| other statistical stuff is much more readable when
| implemented with CTes. Besides - older versions of DB
| software didn't always have window functions or proper
| ranking, so problems such as top-n were not (and in some
| occasions still are not) trivial to write unless view/cte
| is used. We talking pages of single query here, not
| textbook examples.
|
| So is really about dice and slice in a convenient way,
| but less about business schema or the ideal E/R
| fifilura wrote:
| Maybe I misunderstood, and if I did it makes for an
| interesting discussion, but your explanation is mainly
| about what CTEs can be used for not what they can not be
| used for.
|
| In my experience, normalizing tables is mostly for
| simplifying inserts. When you write business logic it is
| rather the opposite, you want non-normalized data because
| this is where the data is all in one place. And this de-
| normalization is a great use of CTEs. As well as a
| general cleanup of the input data, which was my original
| point.
| crazygringo wrote:
| > _Once you stop thinking iteratively and start thinking in set
| operations it becomes quite natural and powerful._
|
| I dunno... I've written a tremendous amount of SQL, and I still
| have to think imperatively (iteratively) in order to write
| queries that are actually performant, and to know which indexes
| need to exist.
|
| It would be wonderful if I could just think in terms of set
| operations, but that tends to result in queries that take 5
| minutes to execute rather than 5 milliseconds.
|
| My entire thought process is basically -- what table do I start
| with, what rows in what order, joining to what, under what
| conditions, aggregating how, rinse and repeat... It's entirely
| a mental model of loops and aggregation, never of set
| operations.
| fifilura wrote:
| It may be true, until you do your ETL in an index-less
| database such as BigQuery or Trino. Postgres will always be
| faster for optimized, end user serving, queries.
|
| But BigQuery allows you to scale it to 100s of CPUs without
| having to worry about indexes.
| crazygringo wrote:
| Yes, I'm talking about end user queries. Not reports that
| take 2 hours to run.
|
| But even with BigQuery, you've still got to worry about
| partioning and clustering, and yes they've even added
| indexes now.
|
| The only time you really just get to think in sets, is when
| performance doesn't matter at all and you don't mind if
| your query takes hours. Which maybe is your case.
|
| But also -- the issue isn't generally CPU, but rather
| communications/bandwidth. If you're joining 10 million rows
| to 10 million rows, the two biggest things that matter are
| whether those 10 million rows are on the same machine, and
| whether you're joining on an index. The problem isn't CPU-
| bound, and more CPU's isn't going to help much.
| fifilura wrote:
| Of course there are optimizations to be made, such as not
| joining on the raw data or saving the order by to last.
| And avoid outer joins between two large sized partitioned
| tables.
|
| But to me those optimizations are not imperative in
| nature.
|
| (And BQ will probable eat the 10 million to 10 million
| join for breakfast...)
| liontwist wrote:
| This sounds awful.
|
| I would do almost any amount of iteration and index tuning
| to keep the query on a single machine rather than deal with
| a networked distributed system.
|
| When you get slow queries the real problem is algorithmic
| complexity and linear workers only can do so much,
| fifilura wrote:
| You are rightfully proud of your skills!
| RobinL wrote:
| You can use duckdb on a single machine. It's also
| indexless (or more accurately, you don't have to
| explicitly create indexes)
| maCDzP wrote:
| SQL clicked once I read the original paper and it explained it
| terms of sets.
| griomnib wrote:
| Being able to master the theoretical, practical, and skill-
| based components of designing a good database schema is the
| absolute truest test of understanding any systems design.
|
| People skip ahead to all kinds of nonsense; but most of
| software engineering is putting the right data into the right
| format, and moving it around reliably.
|
| I just did a major refactor of a complex distributed code base.
| I pretty much only count the work I did on schema re-design as
| the actual "job", the rest was many hours of coding, but that's
| really just implementation.
|
| There are other ways to define schema than SQL of course, but
| it's really the perfect way to learn true systems engineering.
| SoftTalker wrote:
| Very true. My manager at one of my first jobs liked to say
| "get the data model right and everything else will be easy"
| and that has largely been proven true in my experience (and
| it even applies if you're not using an RDBMS).
| pjmlp wrote:
| Yeah, I also kind of like coding in SQL, with PL/SQL being my
| favourite extension language, which is kind of heresy in HN,
| but whatever.
| datadrivenangel wrote:
| The data parsing must have been painful. High wizardry.
| fifilura wrote:
| The data parsing is just one recursive query to read one line
| at a time, and it is done pretty much the same way in each
| problem.
| taberiand wrote:
| I feel like it'd be entirely acceptable to load the inputs into
| tables first and still qualify as pure sql, because string
| parsing in sql is so blergh
| fifilura wrote:
| Input in AoC is always well-formed. And you can always use a
| regexp? Seems like the smallest problem to me. As soon as you
| get past that recursive one row per line trick in the posted
| solutions
| taberiand wrote:
| Obviously it's possible, particularly in this context, but
| generally speaking I'd rather avoid it.
| ikiris wrote:
| This is impressive. Well done. I hate it
| drewbitt wrote:
| Insanity. I wonder how much cleaner it could look in EdgeQL.
| 1st1 wrote:
| Somewhat, depending on what you appreciate in languages. See my
| other comment in this thread:
| https://news.ycombinator.com/item?id=42578562
| RaftPeople wrote:
| I wrote a cubing/containerization system in SQL, but it did use
| the additional capabilities in a sproc for looping etc., so not
| just a single SQL stmt.
|
| But the problem actually mapped well to SQL's capabilities in
| some ways. The basic algorithm was:
|
| 1-Loop through items in descending size order
|
| 2-Test X different positions and orientations such that one
| corner is butted up against an existing corner (of container, or
| previously placed items, etc.)
|
| 3-Choose the winner based on some sort of point systems that
| encapsulates some set of rules (e.g. preferred orientation, or
| minimizes small gaps between sides of items, etc.)
|
| These were some aspects that lined up well with SQL:
|
| A-Testing of X number of positions and orientations all with a
| relative simple statement, using as input: existing corner points
| from previously placed items, some control table with the
| variations in X,Y,Z orientations.
|
| B-The ability to maintain a set of previous options for each
| iteration (e.g. item #2 was tested in these positions with these
| stats and all are temporarily being considered reasonable
| options), add new item to each one of those previous tests and
| accumulate stats. It was pretty easy to configure a "window" of
| how many different open options were being tracked to optimize
| the result without overwhelming memory. The SQL to include the
| options was almost the same as if there was only one previous
| chosen option.
|
| Some aspects were a bit painful to shift the model mentally to
| fit in with SQL.
| trjordan wrote:
| If you like this kind of degeneracy, I tried AoC in Google Sheets
| this year.
|
| I only made it to Day 6, and not even both stars every day. I'm
| pretty confident my Day 7 solution is correct, but I hit per-cell
| character limits on longer inputs.
|
| Enjoy :) oh but don't open it on mobile, some sheets crash the
| app
|
| https://docs.google.com/spreadsheets/d/10FY-89y19tnRM_EAAnCd...
| tnvmadhav wrote:
| holy sh*t, I kneel.
| hu3 wrote:
| I'm on my phone so I can't open it now.
|
| Does it use Google App Script? Would be a way to gain extra
| power I think.
| diggan wrote:
| > Does it use Google App Script? Would be a way to gain extra
| power I think.
|
| Isn't that just straight up JavaScript at that point? Feels
| kind of like cheating if the goal is to complete AoC with a
| spreadsheet.
| 1st1 wrote:
| FWIW I tried doing AoC for a few days with EdgeQL, and the
| experience proved to be quite interesting.
|
| Some of my tweets (I should write a blog post):
|
| - https://x.com/1st1/status/1864069589245858083
|
| - vs SQL https://x.com/1st1/status/1864412869108092997
| hobs wrote:
| When you realize recursive SQL allows iterative calculation then
| you can pretty much do whatever you want, no matter how bad.
| pearjuice wrote:
| Doing it in pure SQL is really impressive but I think the real
| tell-tale sign of peak "cracked engineer energy" is the
| maintained, decade-old blogspot site. Can't exactly put my finger
| on it, but really gives off "niche mastery". I don't even know
| the authors but I'm sure in the right circles a few dudes
| maintaining a blogspot site called "database architects" for a
| decade probably don't need an introduction.
| Moosturm wrote:
| Those authors are the elite of DB mastery.
| forgot_old_user wrote:
| haha I couldn't help but silently mouth "BUT WHHYY" before I
| clicked this link
| pavel_lishin wrote:
| I've been slowly working through AoC2024 in Google Sheets. I
| should really write up what I've been up to.
| linsomniac wrote:
| Long ago I was interviewing for an operations job, and their
| "leetcode" interview question was to create an invoice report of
| some fairly large public data-set. Because of the size of the
| dataset and the way they wanted it sliced and diced, it wasn't
| just a straightforward set of JOINS, and they clearly wanted you
| to do some simple SQL queries and then have some code loops that
| sliced and diced it all.
|
| That wasn't a solution the data scientist in me (which I'm not)
| was ok with. I ran my high level thoughts about the problem past
| a data scientist buddy of mine and he said "sounds like you're on
| the right track", but not being a data scientist it was hard for
| me to put together a soludion.
|
| I told them that if I was asked in my work capacity to do
| something like this, I'd probably be reaching for a reporting
| package like Crystal Reports, but I haven't touched it in ~30
| years. "Sure, I get that, just write some code to generate the
| report".
|
| I had written all the Ansible and related goodies to spin up an
| instance, get MySQL set up and configured, and figured the
| "right" solution was there in the SQL arcana. I played with it
| and played with it, mostly writing off the job because what kind
| of company judges a sys admin based on building a reporting
| package? They had set the expectation that the homework should
| take 2-4 hours, and I chewed on it for longer than that and
| finally said "thanks but no thanks, this is outside my
| wheelhouse".
|
| But I kept chewing on the problem. A couple weeks later I had the
| right set of materialized views and queries and joins that not
| only solved the problem entirely in SQL, but solved it QUICK.
|
| SQL is amazing, but there are a log of tricks of the trade you've
| got to have at your fingertips to make it fly. I'm still
| convinced that a lot of it is just throwing a bunch of things at
| the wall until something sticks.
| pawptart wrote:
| I was crazy enough to try this as well this year. It would be an
| extreme stretch to consider me a SQL expert, but I did make it to
| Day 14.
|
| I agree with all the points in the post. It's really not _that_
| bad. If I had more time to devote to it, I think I could have
| reasonably completed more.
|
| Here's my writeup, contains a link to the repo if you want to see
| some of the soutions.
|
| https://github.com/ty-porter/advent-of-code/tree/master/2024...
| nostradumbasp wrote:
| It takes an incredible human to do something like this. Pure art.
| Not enough of that in the programming world.
| apavlo wrote:
| Thomas is the greatest database systems researcher in the
| world. He _is_ an incredible human.
| yakshaving_jgt wrote:
| A former colleague did the same with most of AoC 2020 a few years
| ago.
|
| https://github.com/Jell/advent2020
| tveita wrote:
| I found Clickhouse easier to handle than pure SQL for this - as
| an analytics database it has conveniences like parsing input
| files with regexp, user defined functions and array functions.
| It's often possible to treat it more like an array programming
| language. But I still sometimes couldn't avoid the tuple
| explosion problem.
|
| https://clickhouse.com/docs/en/integrations/data-formats/tem...
| zX41ZdbW wrote:
| I'd like to check what is the tuple explosion problem.
| neriymus wrote:
| Slightly related, here's one I saw being solved using Cypher:
| https://www.linkedin.com/posts/halfterpierre_adventofcode-di...
| thehours wrote:
| Does 'pure SQL' have a specific definition (or reference spec)?
| The author doesn't mention it, except to say he tested it across
| Umbra, Postgres, and DuckDB. Even then, some days weren't
| 'supported by DuckDB due to a missing xor operator'.
| grumple wrote:
| There is an ISO spec (you have to pay for access):
| https://www.iso.org/standard/76583.html
| spiralshape wrote:
| Great work showcasing SQL's versatility!
| phartenfeller wrote:
| Nicely done. I know this seems crazy at first but in my opinion
| big SQLs are one of the best ways to store complexity.
|
| The problem being complex is the issue. SQL is a standard,
| condensed, extremely performant, actually testable, and logical
| language. Sure, not anybody can instantly maintain it but that
| would be the same as if it was a lot of lines and functions in
| Java. The more lines the more risk for bugs.
|
| I also love how deep SQL goes. And that makes totally sense. It
| powers the world of data since 40+ years so of course people
| asked for niche features. One of my favorite is the model clause
| in Oracle with which you can implement multidimensional arrays. A
| friend implemented Conway's game of life with it in way less
| lines you expect.
| haolez wrote:
| I get this feeling as well, but then again debugging large SQL
| queries can be very opaque. Things like pl/pgsql help, but then
| it starts becoming more and more like a normal programming
| language.
| NDizzle wrote:
| > in my opinion big SQLs are one of the best ways to store
| complexity.
|
| Only if, ONLY IF, you have a lot of people that are well versed
| in SQL. It's very easy to write bad SQL. It's difficult to
| unravel thousands of lines of bad SQL spread across hundreds of
| procedures / views / functions. Ask me how I know...
| diggan wrote:
| > but in my opinion big SQLs are one of the best ways to store
| complexity.
|
| It seems crazy at first, and then I continue thinking about it
| and it still seems crazy for wanting complexity to be in SQL.
|
| Personally, I want anything complex to ideally easily testable,
| both manually and automatically. SQLs is easy to test manually,
| but harder to test automatically, especially compared to lines
| of code in a programming language. At least you can somewhat
| untangle balls of spaghetti code into less dense, then attack
| parts of it. How would you deal with a tangled ball of SQL
| spaghetti?
|
| > The more lines the more risk for bugs.
|
| I don't fully agree with this either, not all lines are equal.
| One big fat line of a 400 character long SQL query has higher
| chance of containing issues not easy to glance compared to 400
| lines of Java code, and I say this even as a person who
| despises Java for various reasons.
| sharpy wrote:
| Once upon a time, as an intern, I had the 'fun' task of
| optimizing the performance of a stored procedure written by
| someone with a math phd. It was more than 6 pages when printed,
| and took more than 30 minutes to run (it was used in billing),
| and had no tests.
|
| Ended up rewriting it in native code, and it run in less than a
| second. Most of the work was proving it produced same
| results... and writing and documenting test cases, so next
| person wouldn't have to go through that.
|
| After that experience, I have generally avoided putting a lot
| of business logic in SQL...
| dagss wrote:
| I have been writing a ton of SQL -- implementing a lot of the
| business logic of a (stream processing) application in it. I
| really really like it, especially that I bring the computation to
| the data instead of the data to the computation.
|
| I often talk to developers who hate that idea though. They want
| me to instead move all the data to the backend, for a massive IO
| hit, just so that the computations can be expressed in a "real"
| programming language.
|
| I think SQL the concept is good but SQL the language is the
| problem. There are so many awkward things in it -- not strange as
| it has had no competition in 40(?) years!! The mental model of
| the program I write is fine but I really need to overlook the
| syntax and to the program I am really writing to see any
| elegance...
|
| What we need I think is a properly designed programming language,
| designed for existing databases (Postgres, MSSQL) compiling to
| SQL dialects. I see some contenders but they all have some
| attachment to a special domain, such as not allowing data
| modifications (PreQL) or being associated with other databases.
|
| Itching to do it myself, but it's a lot of work and a long long
| road to any adoption with no guarantee of success, and no revenue
| stream that I can think of.
|
| The most popular backend languages were made by large companies,
| but I think coding in SQL is in a catch-22 where it will be
| frowned upon until there is a better language and no better
| language until it is more popular..
| willvarfar wrote:
| Me too me too :D
|
| There's a lot that is very right about SQL, but a few clunky
| bits around the edges.
|
| CTEs made a lot of difference, and window functions - which can
| be a bit head bending - made difficult things a tiny bit
| easier.
|
| I'm using BigQuery, which supports structs and arrays, but only
| recently allowed arrays to be grouped by, although there is
| still no equality check etc.
|
| BigQuery is slowly adding more sugar, like aggregate UDFs and
| polymorphic UDFs using ANY TYPE parameters etc, and I find
| myself putting more reused logic into tidy functions, but my
| pet want is for temporary functions to be declared and scoped
| like CTEs so they integrate a lot better into tooling like DBT
| that wants everything to be in one statement.
|
| And the one most productive thing they could add? Allowing you
| to specify null behaviour on JOIN USING. (Having to spell out
| foo.bar IS NOT DISTINCT FROM bar.bar on a join is unobvious and
| ugly. Something like USING (bar RESPECT NULLS) would be so much
| nicer.)
| wink wrote:
| I can't put my finger on it but I think many people see this as
| two operating modes, for lack of a better term. The more
| monolithic and enterprisey your solution (and with bespoke
| DBMS), the more it leans towards anything more complex than an
| index and maybe a couple triggers - and the more
| microservices-y where every small service owns its own database
| (and only half of them are RDBMS) the less complex code is
| desired in the DB itself, because you're also migrating off of
| single instances/clusters a lot (just taking a relatively dumb
| data dump with you, or even just adding new replicas, ship-of-
| theseus-like).
| superdogsniffer wrote:
| Day 19 fails the actual test input. I guess nobody checked his
| work.
| forgotpwd16 wrote:
| Tried both test & full input and worked (using duckdb v0.10.2)
| fine.
| ilya_m wrote:
| I recommend Frank McSherry's real-time log of working through AoC
| 2023 using a SQL dialect::
| https://github.com/frankmcsherry/blog/blob/master/posts/2024...
|
| The most powerful primitive that the dialect has is (mutual)
| recursion, which is effectively a mechanism for outputting a
| fixed-point of a query.
___________________________________________________________________
(page generated 2025-01-03 23:01 UTC)