[HN Gopher] Challenges students face when learning to work with ...
___________________________________________________________________
Challenges students face when learning to work with relational
databases and SQL
Author : gousiosg
Score : 70 points
Date : 2021-08-29 17:13 UTC (5 hours ago)
(HTM) web link (www.growkudos.com)
(TXT) w3m dump (www.growkudos.com)
| sdevonoes wrote:
| I takes days/weeks to pick up the core of SQL:
|
| - create tables, update the schema, insert rows, add an index
|
| - select, filters, joins, order by, limit, inner queries
|
| It takes forever to be comfortable with:
|
| - anything that involves summarizing, grouping, having, min, max,
| windows
| myspy wrote:
| Something other which is hard is writing performant queries.
| Using statements with subqueries/in syntax for example.
|
| And I always forget which join does what.
| AdrianB1 wrote:
| Remembering which join does what is easy: inner joins
| strictly joins the tables, left takes all on left (first
| table), right takes all on right (second table), outer (or
| cross) join are so rarely used you don't need to memorize.
| MeinBlutIstBlau wrote:
| same with the joins. on paper it makes sense, in practice it
| does not. if its more than a "select * from dbo.whatever
| where column abc = 'thing'" i have to refer back to notes and
| play with it.
| [deleted]
| AdrianB1 wrote:
| Not really. I used to teach SQL not a long time ago and about
| 1/4 of the trainees were getting up to speed fast, about half
| in a reasonable time, the rest were there only because they
| were sent there by their managers.
|
| I found that the most important success factors in learning SQL
| is the analytical thinking of the trainee and the way the
| trainer is explaining the concepts, in what order and what
| examples are used (the best examples are the ones the trainees
| meet in their regular work).
|
| The functions are simple, the only difficulty is to remember
| the ones that are not used often enough (ex: some window
| functions). Even in that case, a quick check in the
| documentation is enough to get up to speed. The major
| difficulty with SQL is to write efficient queries on large data
| volumes, covered by the right indexes. This is very specific to
| each RDBMS, especially because of the tools helping with the
| work are specific (ex: SSMS, SQL Sentry Plan Explorer,
| statistics parser etc).
| MeinBlutIstBlau wrote:
| If all you do is SQL, anybody can learn it quick. If you do
| full stack, you're only gonna care about what gives you the
| data you need at that time.
| AdrianB1 wrote:
| No offence, but a full stack's job is not to write good
| SQL, it's to write enough SQL to get what is needed, then
| the development DBA's job is to make it fast and efficient.
| You don't need to be a great car mechanic to drive to the
| office and back.
| 542458 wrote:
| Personally I feel that it took me a while to get really
| comfortable with more complex joins. There's an problem they
| used in the study that required joining a table with itself,
| and honestly I would probably take a while to come up with that
| answer, if at all.
| weaksauce wrote:
| A basic approach(probably what they are going for in a basics
| study) would be something like this off the top of my head
| select c.cid, c2.cid from customer as c inner
| join customer as c2 on c.street = c2.street where
| c.city <> c2.city
|
| though that has reflective duplicates say (1, 5) would also
| have (5, 1) in the output. So I'm not sure if that's
| "allowed"
| fifilura wrote:
| There are lots of cases where a join with yourself is
| applicable, although they are mostly superseded by window
| functions these days.
|
| For example normalisation (join with a groupby/sum of
| yourself) or rank (join each row with all rows that have
| lower value than yourself and count those rows).
|
| But as I mentioned above. A good start is to sketch that
| out in excel. You will realize that what you need is
| another column (e.g. total sum for this id). And from that
| you can work yourself backwards to figure out what is the
| table you need to join with to create that column.
| weaksauce wrote:
| I'm not sure if you are replying to the wrong person but
| the question has nothing to do with a total sum of ids...
|
| the question was: "List all pairs of customer IDs who
| live on a street with the same name but in a different
| city." listed under self-join
|
| that said i haven't wrangled with raw sql in a spell so
| the reading on window functions is interesting.
| fifilura wrote:
| I think one tip here is to always work with CTEs "WITH", and
| not nest queries. That way you can always go back and check
| "what is it I join with what", by querying the individual steps
| with LIMIT 10.
|
| The other tip is to sketch the problem in excel/google sheets
| when it gets hairy. Not the actual code (I don't have a clue
| how to do that, others have), just the values in the different
| steps. In the end it is only about rows and columns.
|
| But that said, these days a lot of it happens intuitively for
| me, I pretty much know the solution before I can spell it out.
| It certainly was not like that when I started.
|
| When you begin, "programming without for loops" feels like
| programming with your right hand tied behind your back. But in
| hindsight you get a lot of exercise in the immutable paradigms
| of functional programming, working with comprehensions, sets,
| maps folds comes very natural.
| odipar wrote:
| Yep, CTEs are a huge boon to structure your SQL - use them
| where you can.
| btilly wrote:
| Meh, summarizing, grouping, etc aren't that hard.
|
| However WINDOW queries definitely have a learning curve. Not
| the least because useful examples almost always require you to
| use a nested query.
| magicalhippo wrote:
| I learned SQL on a need to know basis. For me, recursive
| queries were the ones that needed the most time to click.
|
| Another one that caught me by surprise was NULL vs
| unknown[1]. That bit me in a couple of queries.
|
| [1]: https://learnsql.com/blog/understanding-use-null-sql/
| pge wrote:
| and worst of all, anything that involves vendor-specific
| keywords...
| ipaddr wrote:
| One of the best ways to learn advanced oracle specifically is
| through the ask tom q/a. The question are difficult and the
| answers teach more than any course.
|
| https://asktom.oracle.com/pls/apex/f?p=100:1000::::::
| agumonkey wrote:
| To me the most difficult part is learning sql before knowing what
| can be done with a computer, both on the complexity and the
| language design part.
|
| Before learning interpreters/compilers/prolog, I'd spend a lot of
| time trying to figure out about naming/namespaces in queries,
| while after doing some PLT, it all becomes very very obvious, you
| can now focus on the operators and since you'd know how far can
| programming go, you'd see faster how nested queries could make
| sense, what aggregating functions meants etc
| TrackerFF wrote:
| I see they mentioned previous course knowledge - this is
| something you see in many (programming) classes.
|
| Students that have zero prior knowledge in programming, are able
| to pick up functional programming pretty easy. Students that have
| studied and used paradigms like OOP, seem to have a hard time
| grokking functional programming - as they see everything through
| the lens of OOP (and the languages they've used).
| btilly wrote:
| Functional and OOP techniques do not seem to be best friends.
|
| http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom...
| uses Java to discuss what this can look like in an extreme
| case.
| gnat wrote:
| https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 has the actual
| paper.
| hahamrfunnyguy wrote:
| In my experience, inexperienced database developers pick up SQL
| fairly quickly under the guidance of an experienced mentor.
| [deleted]
| Spooky23 wrote:
| This.
|
| Usually I see people struggling to formulate questions. They
| know what they want, but don't understand how to get there.
| Left to their own devices, they hack up some nightmare in
| Excel.
|
| I worked with a summer intern on creating reports and learning
| SQL. She was a really smart business major who ended up with
| the wrong work assignment. I was getting 5-7 questions a day
| from her in June, 1-2 a week in July and by the time I got back
| from vacation in August, she had basically done about 90% of a
| project that was going to be hired out and was showing me some
| features of the database we were using that I didn't know!
|
| It inspired her to switch majors and she is a fancy data
| scientist somewhere! Awesome mentor experience.
| patrakov wrote:
| Direct link to the research paper, instead of the summary:
| https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 (pdf)
| tracyhenry wrote:
| SQL has a steep learning curve. It expose almost zero insights
| into the underlying query execution. As a result, increasingly
| amount of inefficient queries are being written by ML engineers,
| who in general care little about query efficiency. The solution
| right now seems to have a team of data engineers to optimize the
| queries.
|
| Should we think about an alternative, at least for ML ETL
| workloads?
| tester756 wrote:
| >SQL has a steep learning curve.
|
| Does it? I think SQL just sucks and its tooling sucks too
|
| Even SQL Management Studio which felt way better than PGAdmin
| is miles behind IntelliSense that's offered by Visual Studio
| for C# (when it comes to reliability)
|
| SQL would benefit a lot from being like C#'s LINQ (Query
| syntax) e.g:
|
| var result = from s in stringList
| where s.Contains("Tutorials") select s;
|
| some SQLv2 is something we need
| keithnz wrote:
| try DataGrip, it's got really nice intellisense and
| autocomplete. I'm not sure how your example from linq is any
| better than SQL select s from stringList
| where s like '%Tutorials%'
| tester756 wrote:
| The difference is when you type "select (here)" then your
| tool cannot really give you hints about column names in
| table
|
| because you haven't specified the table yet.
|
| Meanwhile LINQ starts with table name.
|
| Ofc you can always go ahead and write queries in "different
| order"
| da39a3ee wrote:
| https://opensource.google/projects/logica
| dspillett wrote:
| _> SQL has a steep learning curve._
|
| Overall I don't think it is that steep, though maybe I'm
| blinded by having worked with various implementations of it for
| more than two decades. The key sticking point is jumping to
| thinking in a set based manner to get best results. The rest of
| the difficult parts are when you need to think about
| implementation details because the query planners are no
| perfect (index hints and such) or being aware of limitations
| (like postgres before the latest major version having
| optimisation fences around CTEs).
|
| _> It expose almost zero insights into the underlying query
| execution._
|
| That is pretty much by design. It is intended that you say what
| you want and let the query planner worry about implementation
| details. Of course how you tell it what you want involves
| learning to express those intentions in SQL. It does fall apart
| a bit when implementation limitations become an issue, at which
| point you are forced to think about the underlying
| implementation and how you might prod this more imperative code
| so that it interprets and process your relational descriptions
| most efficiently.
|
| _> As a result, increasingly amount of inefficient queries are
| being written by ML engineers_
|
| That isn't specific to ML. I see a lot of inefficient data
| interaction from code written by other devs. This seems to be
| for two reasons:
|
| 1. People seem to have taken to heart "make it work, make it
| work correctly, only then worry about making it work fast" to
| heart but tend to skip that last part and assume because all is
| well with their test sets of data at hundreds or thousands of
| rows (or sometimes tens and singles) that it'll scale just find
| to the hundreds of thousands or more that the clients datasets
| will eventually contain.
|
| 2. People using further abstractions without much care for how
| they implement their directives (again, in an ideal world they
| shouldn't have to), resulting in massively overcomplex queries
| as the framework tries to be clever and helpful and preempt
| what might be needed, getting _everything_ whether needed or
| not (effectively `SELECT _`) meaning the query planner can 't
| apply families of its internal tricks for better performance,
| or getting many rows individually instead of as a set which
| sometimes means a lot of extra work for each row.
|
| There is a definite "we'll worry about that when it happens
| attitude in both cases which is dangerous. While a live system
| has practically ground to a halt and the client needs their
| report by EOP or someone will get it in the neck (and be sure:
| they will pass that on to you!) is not a good time to be
| optimising data access, or worse finding out the structure just
| doesn't support efficient generation of the required data.
| Another common failing is applying what would idealy be UI or
| BLL concerns (timezone conversions etc) in the SQL statements
| in a way that blocks index use.
|
| _> Should we think about an alternative, at least for ML ETL
| workloads?*
|
| I don't work with ML so that is a little outside my day-to-day
| wexpertise, but I'd wager ETL there has the same problem as
| everywhere: the basics are all well known and very well
| optimised for already. The rest differ so much between
| applications that no one abstraction would be optimal for more
| than a small portion of real world needs.
|
| I'd be wary of a separate team for optimising queries. I
| suggest a reasonable understanding in the whole dev team with a
| data expert embedded who is involved in design work and code
| reviews so issues are caught early and junior devs can be
| tutored as needed so by the time they are seniors they don't
| need the data expert except for really gnarly problems or long-
| term planning.
| ttfkam wrote:
| Step 1: "SQL sucks!"
|
| Step 2: Let's make a database engine that doesn't use SQL.
|
| Step 3: "This is hard!"
|
| Step 4: Make SQL access layer.
|
| Wash. Rinse. Repeat.
|
| See: PartiQL
|
| Those who ignore the lessons of SQL are doomed to reimplement
| them...poorly.
| ttfkam wrote:
| Corollary:
|
| 1. "SQL doesn't scale!"
|
| 2. We made this database engine that's "web scale"!
|
| 3. "This is hard to use!"
|
| 4. Make SQL access layer.
|
| See: Spanner
|
| Those who blame SQL for their performance problems are doomed
| to repeat them using a proprietary syntax.
| odipar wrote:
| My first encounter with 'SQL' was a course on relational algebra
| that was taught at my university.
|
| It started out with defining relations as a mathematical
| construct, and continued with various operators on such
| relations. Then they continued explaining the various normal
| forms up the fifth normal form. I was completely out of my depth,
| but at least it was good and solid theory that could be learned.
|
| What really messed with my head is they then introduced SQL as a
| 'practical' implementation of relational algebra. I'm still
| having nightmares where I try to understand nested HAVING
| statements that where asked at the exams.
|
| Hey relations don't contain duplicates! But that's OK. We should
| call (modern) SQL 'BAG ALGEBRA'.
| melony wrote:
| Don't forget loops
| DaiPlusPlus wrote:
| I noticed that the article doesn't mention relational-calculus at
| all, only relational-algebra. That's a huge oversight, imo - as I
| feel one needs to understand both RA and RC in order to grok SQL
| and other RC-like systems, like Linq in C#/.NET and List-
| comprehensions in Python (or even use those _before_ RC /RA and
| SQL).
|
| -------
|
| Rather than improve how SQL is taught (which seems to be the
| paper's objective), why not improve SQL so it isn't as horrible
| to try to learn in the first place?
|
| The barriers to grokking SQL could be lowered considerably if SQL
| made minor adjustments like moving the projection part of a
| SELECT query to being below or syntactically after the WHERE
| clause instead of being at the top, and making SQL more "natural"
| to write-in without needing excessively verbose inner-derived-
| table expressions when all you want is to do perform some
| repetitive calculation which will be reused in later query steps.
|
| Also, the GROUP BY clause really needs to be renamed to
| "AGGREGATE BY" or similar, because when normal people think
| "group" they're probably thinking of sorting/ORDER BY or
| PARTITION BY and they certainly don't imagine "don't display
| these rows at all, lol".
|
| I just don't understand what drives the ISO SQL language design
| committee - I'd have thought that the newer revisions (e.g.
| SQL-2003) would have improved the language's ergonomics - on the
| contrary: the language's grammar and verbosity gets worse every
| release, and the team has strange priorities: apparently they
| feel needing to generate in-memory XML is more important than
| deferrable constraints - and I only ever see ISO SQL's XML
| features being abused to make-up for a lack of decent string-
| aggregation functions.
|
| (...I could talk for hours about everything wrong with SQL.)
| de6u99er wrote:
| The trick of becoming really good at SQL (applies to all areas
| of IT) is having a certain ambition to produce high performance
| beautiful (readable) code.
|
| This requires experience which can only be gained by rolling up
| your sleves and working on stuff until the high ambition has
| been satisfied. Sometimes when I see old code from myself, and
| I can follow what I have been doing I get really proud of
| myself. Many times I end up slightly improving it based on new
| knowledge I have acquired since I initially wrote it.
| minism wrote:
| High performance and readable certainly. Not sure why
| beautiful would be something to strive for though
| AdrianB1 wrote:
| Readable code can be easily done via good formatting, but
| performance requires a combination of writing the query in
| the right way and the indexes to support it. The second part
| is not even visible from the query and most of the time is
| not self-explanatory, but the best part is that indexing is
| not even universally valid, statistics decide execution plans
| and the same query with the same indexes can result in very
| different performance on 2 different instances.
| jimbob45 wrote:
| If the general computing community can agree on anything over
| the last 20 years, it's that Python 2->3 was a disaster. Even
| though Python 3 made several highly necessary (and irreversibly
| transformative) changes to Python 2, no one liked it because it
| fundamentally changed the language to something unfamiliar.
|
| I'm guessing the SQL and C++ committees looked at that
| transition and decided that such transformative changes really
| need to be done in new languages (like the Perl -> Raku change)
| rather than in a new version which risks alienating your
| existing base.
| btilly wrote:
| The Python 2 to Python 3 migration was such a disaster that
| Python 3 is now used by both more programmers, and a higher
| proportion of programmers, than Python 2 ever managed.
|
| This is not to minimize the pain of switching. But it does
| not seem to ahve limited the success of the language.
| darksaints wrote:
| Python 2->3 was a disaster, but I'd refrain from
| extrapolation because a lot of that difficulty was very
| specific to dynamic typing or python itself. There are tons
| of languages that have gone through far more transformative
| changes in the core semantics of the language, and have gone
| a lot smoother.
| DaiPlusPlus wrote:
| Oh of course - I have no doubt the ISO SQL committee is so
| conservative (no... they're _regressive_ ) is because of the
| sheer collective industry investment in not-only SQL tooling
| and SQL-compatible databases, but just energy-spent in
| teaching non-CS/SE/programmer types in businesses how to
| express their data-queries in SQL. It's very, very difficult
| to get the kind of industry cohesiveness around any technical
| standard, so the fact that SQL is so widely supported is a
| miracle (though it probably has something to do with US
| federal government requirements for information systems to
| support it, just like how POSIX is a thing because of the fed
| pushing for it).
|
| To be clear: I am not advocating for a brand new query-
| language syntax or any kind of Python3-style overhaul, but
| I'd like to see SQL start to take small steps towards
| integrating the lessons learned from the past 60+ years of
| language design rather than doing the complete opposite.
| monkeydust wrote:
| Have been using openai codex for a week and it's shockingly good
| at SQL with well defined prompts.
| pcblues wrote:
| I have been developing software that includes SQL for twenty
| years, and watched my own mental progress from misunderstanding
| to understanding. I found the biggest initial problem is that I
| used to imagine SQL queries as an imperative language rather than
| as expressions of data. Maybe in the teaching of SQL, this should
| be highlighted so absolute beginners can have that mental model
| when they are formulating solutions and grappling with the
| syntax.
| k__ wrote:
| I don't know if that's enough.
|
| Understanding the difference between declarative and imperative
| programming is rather hard with all the abstractions we have
| today.
|
| People always say, declarative programming is defining what you
| want, not doing the steps needed to get it. But today no
| imperative interface requires you to do all the steps either,
| plus, most programming languages use both paradigms at the same
| time.
| pcblues wrote:
| I guess my point was that if you are trying to achieve
| results in a language paradigm that isn't the one the
| language was designed for, the learning curve is _really_
| steep, and to use the declarative features of any language
| still requires you to understand the paradigms' differences.
| When I was at uni it wasn't until the final year that
| Programming Paradigms was a course, but even a rough
| introduction to them in any of the languages I studied
| earlier would have helped. Something like, "This language is
| used like this. It is not used like this, for example."
| k__ wrote:
| I'm not even sure, I totally understood the difference now,
| 10 years after I studied CS.
| btilly wrote:
| _Understanding the difference between declarative and
| imperative programming is rather hard with all the
| abstractions we have today._
|
| The distinction is are you telling the computer *how* to do
| it, or telling it *what* steps to take.
|
| If, even with access to all of the code, you'd have to ask
| the computer how it chose to do it to figure out what it did,
| you have a declarative system. If the code reads like
| instructions for a recipe, it is imperative.
|
| The complications come with the fact that these two paradigms
| do not describe all of the possibilities. Notably object
| oriented and functional designs are neither imperative or
| declarative. (But may share some features with both.)
| k__ wrote:
| Yes, that's probably what was always my problem.
|
| Theory is one thing, but actual programming languages are
| something different. An "impure" mix.
|
| So, when people told my language X is imperative and
| language Y is declarative, I got confused, because they
| often had parts of both.
| da39a3ee wrote:
| I've done backend web development with a relational DB via an ORM
| for 10 years. I'm OK at that, but I'm fucking hopeless at SQL. I
| know that my opinions are thus undermined, but I really wish we
| could get rid of SQL and replace it with something like logica
| [1] like today.
|
| SQL's pseudo-natural language syntax is an embarrassment and its
| lack of composability is even more of an embarrassment.
|
| [1] https://opensource.google/projects/logica
| simonw wrote:
| Have you used CTEs much (aka the WITH statement)?
|
| I find them to be a huge step forwards in terms of adding
| composability to complex queries.
| odipar wrote:
| Yes I concur: CTEs is closer to the spirit of relational
| algebra: every step/expression should yield a table/relation.
|
| As data munging is about
| combining/correlating/sorting/grouping data, why not have a
| sound (bag) algebra to do that? Such algebra would give us
| equational reasoning, proofs, etc.
|
| And consequently: students would be learning an algebra which
| is easier to learn IMO.
| da39a3ee wrote:
| Thanks yes I have learned to use the WITH statement, and I
| agree it gives more composability. But still, a half way
| house wouldn't you say?
| dehrmann wrote:
| > For example, some students wrote queries containing ,[?],
| instead of != or <>.
|
| Was this done on paper? Typing [?] takes some doing.
| Wevah wrote:
| Option-= on a Mac with the US layout, fwiw.
| JadeNB wrote:
| Man, Mac's keyboard shortcuts for special characters irritate
| me so much.
|
| First, they're there, and it's absolutely wonderful! I use
| far more semantically accurate Unicode rather than lossy
| ASCII approximations than I did back in my old Windows days.
| (If you don't know the special characters you can get, turn
| on Keyboard Viewer and whack your keyboard, especially
| modifier keys, a bit.)
|
| But ... I can't customise them. Even back in the days when
| macOS was OS X and believed in user customisation, these
| specific shortcuts were frozen and un-customizable. (Like the
| folder shortcuts in Finder. Maybe it makes sense to you for
| CMD-SHIFT-D to open the Downloads folder, not the Desktop.
| Too bad!)
|
| (Boy, I hope I'm wrong and someone will come along and
| explain my stupidity to me.)
| lelandfe wrote:
| > these specific shortcuts were frozen and un-customizable
|
| I think Karabiner should allow you to do this:
| https://karabiner-elements.pqrs.org/
|
| It's a utility that "remaps" keys - you set up key/key
| combinations that fire the original key/key combinations
| (it does not remove the original combination). E.g. you
| could bind Cmd-Q to Caps Lock if you wanted a really fast
| way to quit stuff.
| JadeNB wrote:
| The memory of what happened with kext's always makes me
| leery of relying on anything that reaches too deeply into
| macOS's guts, so I've always shied away from Karabiner,
| probably unreasonably.
|
| Old-style OS X believed in customisation--even now, you
| can set per-app keyboard shortcuts; it doesn't seem
| possible to bind Cmd-Q to Caps Lock, but, after slipping
| from Cmd-W to Cmd-Q and so quitting rather than closing a
| tab one too many times, I do have Cmd-Q bound to Cmd-
| Opt-Q for Safari only--and yet there's no Apple-blessed
| way of changing _those_ shortcuts, when it clearly had
| the architecture in place to allow it. That always
| irritated me.
| turnerc wrote:
| From the study:
|
| > Participants wrote their notes and answers on paper, which
| they showed in front of the webcam.
|
| Yes it seems they did
| dehrmann wrote:
| That's pretty flawed methodology since you'd want to know
| what problems people encounter in the real world and how
| quickly they solve them.
| nightpool wrote:
| Yeah, IMO counting this as a syntax error is a pretty low blow.
| It's completely clear what the person intended, and they would
| probably have no problems clarifying if the researchers asked
| how they would type that query in.
| [deleted]
| pcblues wrote:
| Easy if you have an APL keyboard :) (Hint, it's on the 8)
| https://www.dyalog.com/uploads/images/Business/products/us_r...
___________________________________________________________________
(page generated 2021-08-29 23:00 UTC)