[HN Gopher] Goal: Pass all 4259065 tests in sqllogictest in 1 week
___________________________________________________________________
Goal: Pass all 4259065 tests in sqllogictest in 1 week
Author : luu
Score : 158 points
Date : 2022-10-03 16:56 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| iLoveOncall wrote:
| No dependencies but the very first step relies on a dependency?
| It's not because you copy-paste it that it's not a dependency...
| eatonphil wrote:
| He also used a compiler, and a standard library, and a
| computer.
| iLoveOncall wrote:
| There's a bit of a difference between taking away the only
| truly hard part of the project and using the standard
| library.
| eatonphil wrote:
| The SQL grammar is already defined.
|
| He didn't invent SQL. He had to take the grammar from
| somewhere.
|
| You wanted him to handwrite it copied from the ANSI SQL
| paper?
|
| Or to just think up all the possible grammars and ignore
| the real paper?
|
| How would that be different or better than this?
| lelandbatey wrote:
| The first dependency is a "dependency", but in the same way
| that an RFC you have to read with your eyes is a dependency.
| Except in this case, the dependency is a definition, in formal
| language (extended-backus-naur form, a.k.a. "e-bnf"), of the
| SQL language, as defined in ISO/IEC 9075:2016[0] (the 2016
| version of the SQL language specification). They then use this
| language definition as the input to a program which generates
| parsers (called a parser-generator[1]), so that they can
| quickly get to where they have a library which they can use to
| parse/validate/inspect the SQL being given to them.
|
| Copying the BFN "from an external source" is a smart move,
| since it means they don't have to do lots of busy work slowly
| reading and transcribing the specification; someone's already
| done that step, so why would anyone expect the author to waste
| time?
|
| Using a parser generator is also a smart move since they exist
| already and are used all over the place (nobody hand-writes
| parsers for large languages; that's just a needless source of
| tedium and bugs). The code that's spit out of the parser
| generator is novel; that's newly created code which isn't taken
| from someone else's Github/other repo.
|
| Ultimately, I don't see how any of what the author's done
| constitutes "[relying] on a dependency" given that they're not
| using anyone elses Zig source code in their compiled binary,
| they're writing lots of code for themselves to use, just very
| quickly, and with powerful tools.
|
| [0] - https://en.wikipedia.org/wiki/SQL:2016
|
| [1] - https://en.wikipedia.org/wiki/Compiler-compiler
| eatonphil wrote:
| > nobody hand-writes parsers for large languages; that's just
| a needless source of tedium and bugs
|
| That is absolutely not true! In fact, most major programming
| language implementations use handwritten parsers [0].
|
| That said:
|
| > Using a parser generator is also a smart move since they
| exist already
|
| Jamie _wrote_ the parser generator here too. So it 's all the
| more "from scratch".
|
| [0] https://notes.eatonphil.com/parser-generators-vs-
| handwritten...
| spullara wrote:
| So does anyone else agree that there are actually 10x (or more)
| developers? This is a pretty good example of how one works.
| sangnoir wrote:
| 10x developers exist, but their prevalence is greatly
| overstated. Also, churning out code at 10x doesn't make one a
| 10x developer (this could even hamper everyone else).
|
| I also propose that moniker be banned from being self-applied,
| and is in fact, a smell test: if you encounter a colleague
| calling themselves a 10x developer, start interviewing
| immediately. No good will come out of that.
| Jochim wrote:
| My last boss was a 10x developer. Fantastically smart guy,
| socially awkward but nice, really looked out for his team but
| an absolute nightmare to work with technically.
|
| The majority of the team couldn't understand his code. We'd
| have newly hired senior developers just leave rather than
| deal with it.
|
| He'd rolled his own code generator for our data model that
| did everything from model generation to the web controllers.
|
| The result was that while he could pump out work quickly,
| what would've otherwise been a quick fix for a graduate
| developer now required a deep understanding of a complex
| system.
|
| This had the effect of turning what would have otherwise been
| a team of 1-2x developers into a team of 0.2-0.5x devs with a
| retention problem.
| tempxyz wrote:
| How is he 10x if no one understands his code? An expert at
| quick and dirty?
| Jochim wrote:
| wolf550e summed it up pretty much perfectly.
|
| There was a great deal of thought put into it and he
| could extend and modify the output really quickly.
|
| The complexity of the system basically made it so that
| what would otherwise have been a simple task achievable
| by a graduate required a deep understanding to carry out.
| wolf550e wrote:
| Imagine that instead of developing an app in a popular
| programming language, someone implements an idiosyncratic
| domain specific language suitable for the kind of app
| they need to build, and then builds the app using that.
| The result would work and maybe even let them be very
| productive churning out more features of the kind that
| were envisioned when the DSL was developed. If they need
| to extend or fix the DSL, as the original author, they
| can. Someone else will need to learn the DSL before they
| can do any work on the app.
| [deleted]
| kuroguro wrote:
| > He'd rolled his own code generator for our data model
| that did everything from model generation to the web
| controllers.
|
| Heh, I've actually done that... twice. Luckily it was a
| team of 1 and I wouldn't expect anyone else to understand
| my mess. The code generation was extra ugly since I planned
| to get rid of it eventually to craft out smaller details.
| It was great at doing repetitive work in bulk. Not sure if
| it was actually faster but at least it was less boring
| doing things that way.
| sangnoir wrote:
| As a junior, I worked with a senior who was considered
| "10x" and working with him was a pain: he decided the rules
| didn't apply to him, and management tolerated his repeated
| violations of conventions instituted to make our dynamic-
| language codebase manageable.
|
| Anytime he "improved" a module, no one else could maintain
| it as that would entail additional rule-breaking, which was
| _verboten_ for mortals, so only he could maintain code he
| touched. Combined with the fact that he didn 't add any
| tests: the net result was he was slowly and surely
| subverting the codebase into his personal, brittle domain
| that no one else could change. He was slowing everyone else
| done, but all management was looking at was his velocity at
| closing bugs or rolling out new features while creating
| tech-debt. His boastful personality was just the icing on
| the cake.
| moritonal wrote:
| This is kind of an unhealthy attitude?
|
| @jamii seems super talented, but his bio says "in the past I've
| built database engines, query planners, compilers, developer
| tools and interfaces for [a...] myriad [of] consulting and
| personal research projects.", along with his repo's being
| related to SQL parers, or literal text-editors working purely
| on string manipulation. He is also sponsored to spent 100% of
| his time doing exactly this.
|
| What I mean that he is almost definitely a 10x dev at writing
| SQL parsers. But ask him to write a shader that renders a neat
| waterbed material and he'd be likely a 0.8x dev? The overlap
| between experience and context is key.
| blowski wrote:
| I agree that I think this can be an unhealthy attitude. A lot
| of us are working on projects where the biggest challenge is
| convincing the Product Manager to provide more than one
| sentence in the brief.
|
| That said, I can't think of any technical domain where I
| could do this, even if provided with all the tests up front.
| vsareto wrote:
| "10x" mostly functions as a reputation badge. It's not a
| realistic metric for performance.
|
| It's something you get from other people. There's not a good
| test to figure out if you're 10x better than some randomly
| picked average developer.
| spullara wrote:
| When people say they don't exist they mean that they don't
| exist generally not just when testing someone on a new
| domain.
| bcrosby95 wrote:
| I don't know anyone that would say that 10x developers
| don't exist at all. It's too easy to bring up someone like
| John Carmack or many foundational people in computer
| science that invented algorithms us layfolk could never
| imagine.
|
| My experience with the phrase is people mean finding a
| "diamond in the rough" who can code circles around anyone
| else. It's not about finding a Norvig or Carmack, it's
| about finding a fresh graduate that you can stick on a
| problem and they will be bountifully productive.
|
| It's basically a manager's wet dream: extremely productive
| but cheap. In my experience real 10x people appear to be
| the opposite: seemingly slow but incredibly expensive.
| Everyone I actually consider 10x makes millions. And of
| those that are friends, they didn't really reach that 10x
| stage until their 30s or 40s.
| xhrpost wrote:
| Of course it's impressive but this isn't the sort of work the
| average dev ends up doing day to day. We spend our time digging
| into dependencies (this had zero) both internal and external,
| interfacing with stakeholders and looking up business logic for
| a change. All those async tasks ultimately add up to
| significant headwinds even for the best "10x" dev.
| stocknoob wrote:
| It's amusing that people even question the existence of 10x
| developers.
|
| What fraction of devs could even complete this, let alone in
| merely 10x the time?
| viraptor wrote:
| The fraction of devs that regularly deal with databases and
| parsing. There are no 10x devs. There are devs with long
| experience in a specific category. The 10x idea is kind of
| stupid in terms of companies looking for them - it just means
| "we want people successfully trained somewhere else".
| subroutine wrote:
| I agree there are probably no 10x devs. This person took 7
| days to (almost) complete this task (which they cherry
| picked for their self), suggesting the average 1x dev with
| experience in this domain would take 70 days.
|
| I think it would be more reasonable to call someone a 3
| -sigma dev (someone 3 standard deviations above the mean.
| These would exist because that's how stats work)
| ZephyrBlu wrote:
| I've always read 10x as an order of magnitude better, not
| necessarily 10x faster. 3 sigma is probably better
| terminology though.
| rcxdude wrote:
| It's worth pointing out that in the origin of the "10x
| developer" term, it's relative to the worst performing
| devs, not the average.
|
| Also, you're not guaranteed to have an example 3 standard
| deviations above the mean. It strongly depends on your
| distribution and sample size.
| subroutine wrote:
| I think the prevailing current definition is wrt. the
| average dev. The worst dev could be arbitrarily bad
| (suggesting the average dev could also be 10x).
|
| You're right about the sufficient sample size.
| cowmoo728 wrote:
| It's like questioning the existence of 10x NBA players or 10x
| chess players. The top super GMs are basically 10x better
| than most other GMs, who are themselves 10x better than most
| IMs. It seems strange that programming would be one of the
| fields that doesn't have a similar distribution of skill.
|
| I think the actual pushback of the 10x programmer idea is
| that it's more often used to bully regular programmers into
| working longer hours, rather than actually identifying top
| performing programmers.
| sophacles wrote:
| There's also the part where "developer" is more akin to
| "athlete" than "nba player". There's lots of different
| types of developer, just like there's lots of different
| types of athlete. A 10x NBA player will certainly not be a
| 10x Olympic Swimmer also, more likely .1X. Part of the
| problem is 10x developer gets talked about like they are
| going to be 10x athlete at NBA and Swimming and Golf and,
| and, and... that's what doesn't exist.
| samsquire wrote:
| I handrolled a very very basic SQL parser in my toy database
| hash-db
|
| https://GitHub.com/samsquire/hash-db
|
| It's distributed dynamodb style keyvalue, SQL and Cypher graph
| database.
|
| I feel if you want to get a project moving forward for something
| as large as a database, you can get something rudimentary working
| and extend the parser when you need those features.
|
| SQL wise it supports Joins and where's and rudimentary full text
| search It uses rockset's converged indexes for ease of query
| generation.
|
| If you're interested in queries then you should read this blog
| post. https://rockset.com/blog/converged-indexing-the-secret-
| sauce...
|
| The database is partly multimodel with document storage and SQL
| and graph Cypher querying but I am yet to get all the models to
| be mutually queryable. The document storage is queryable by SQL
| but graphs aren't queryable by SQL or as a document.
| apetresc wrote:
| > So parsing the bnf is kind of a mess, but I only have to parse
| this one bnf and not bnfs in general so I just mashed in a bunch
| of special cases.
|
| Surely at that point it would've been a lot cleaner and more
| practical to just _edit_ the one file you need to parse, to
| remove the weird line breaks, etc., rather than building special
| cases into your parser to work around those lines? What am I
| missing?
| ruuda wrote:
| The input is 9139 lines long, each anomaly probably occurs
| dozens of times.
| Forge36 wrote:
| It's possible the files can't be edited within this project. I
| had a similar experience writing a code parsing engine.
| Sometimes it's best documented as code debt and the rewrite can
| be done at a later time.
| kris-s wrote:
| What a cool project, I bet they learned a ton doing this.
| ok_dad wrote:
| What's the `scc` tool I see used there?
| ok_dad wrote:
| Here's a new relevant post for anyone looking:
|
| "Processing 40 TB of code from ~10M projects with a server and
| Go for $100 (2019)"
|
| <https://news.ycombinator.com/item?id=33072846>
| shadycuz wrote:
| I'm interested as well.
| lifthrasiir wrote:
| Most likely: https://github.com/boyter/scc
| okasaki wrote:
| 7 ways of installing it, but no deb or rpm. Is this the
| wonderful future of FOSS where developers don't bother
| working with distribution maintainers anymore?
| ok_dad wrote:
| Someday someone will make a hyper-package-managers to be
| able to manage their packages installed via the thousands
| of package-managers out there today. Then, several other
| hyper-package-managers will be developed to cover the cases
| the first didn't cover. Then comes the hyper-hyper-package-
| managers...
| duped wrote:
| This has been the norm for awhile. No one wants to work
| with distro maintainers because their model is incompatible
| with how people build and distribute their software.
|
| You either get a curl sh, a tarball, or a wrapper around
| either of those that pretends to be a .deb or .rpm.
| dec0dedab0de wrote:
| Hasn't it always been rare for developers to maintain
| distro specific packages? That's why distro's have package
| maintainers, they also modify the layout and default
| configurations and whatnot to be consistent with the rest
| of the distro.
| rcxdude wrote:
| Yeah, Official debs/rpms are a thing but often completely
| independent of any distribution's packaging efforts (and
| they often have very different priorities).
| mperham wrote:
| I've distributed a lot of software and DEB/RPM has to be
| the worst. I'd suggest those distros improve on their
| developer ergonomics if they want to stay relevant. 100% of
| my customers use Docker images these days as it is much
| much easier to use.
| okasaki wrote:
| I guess that's for web stuff? You wouldn't distribute
| 'cloc' and similar in a docker image.
|
| One hopes.
| ArchOversight wrote:
| It's the easiest way to distribute software in a way that
| is controlled by the author of the software and where the
| author can reasonably control all of the dependencies
| installed.
|
| This way you are providing a one-stop shop that can
| easily be run. I have all kinds of tools that are docker
| containers because its simpler to not have to worry about
| all kinds of library mismatches or locations of shared
| libraries, and instead ship a minimal docker container
| instead.
| ok_dad wrote:
| I will refer you to the first tool I thought to Google
| with "docker image for X":
|
| https://hub.docker.com/r/stedolan/jq
|
| Yes, it's despicable.
| okasaki wrote:
| 10M+ pulls >:O
|
| https://www.youtube.com/watch?v=umDr0mPuyQc
| mdaniel wrote:
| > I lost a lot of time in the morning to segfaults in the zig
| compiler. (https://github.com/jamii/hytradboi-jam-2022#day-4)
|
| I bet the zig project would be interested in the sha of the tree
| that blows up their compiler
| puffoflogic wrote:
| I bet they wouldn't. They're well aware their complier often
| runs code inside if(false) blocks (in certain positions) and
| they just don't see this as important. Moving fast is more
| important. (Where exactly they're moving to is not quite
| clear.)
| an_ko wrote:
| > their complier often runs code inside if(false) blocks (in
| certain positions)
|
| Do you have an example? Sounds like a catastrophic edge case.
| bsima wrote:
| And I would have done it too, if it weren't for that meddling
| parser
| chubot wrote:
| I would have liked to have learned more about how the query
| planner and evaluator work! There was almost nothing about that?
| Just the tests magically moving from 0% to 95%.
|
| e.g. What table and value representation was used?
|
| FWIW I suspect using a LALR(1) parser in Zig on the sqlite
| grammar would have saved some time and gotten past the parsing
| headache.
|
| The sqllogictest comes directly from sqlite, so it seems like the
| parsing problem is mostly "port from C to Zig" (which are very
| similar metalanguages, or I guess meta- meta- languages in this
| case :) )
|
| Lemon is apparently a mini-yacc, just for sqlite's grammar, and
| is about 7K lines of C code, with no deps:
| https://sqlite.org/src/doc/trunk/doc/lemon.html
___________________________________________________________________
(page generated 2022-10-03 23:00 UTC)