[HN Gopher] SQL Tips and Tricks
___________________________________________________________________
SQL Tips and Tricks
Author : regexman1
Score : 199 points
Date : 2024-09-25 04:20 UTC (18 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| dooer wrote:
| I am bad at SQL so this is great
| isoprophlex wrote:
| Wow, that EXCEPT trick is neat! ~10 years of using SQL almost
| daily, and I never knew...
| the_gorilla wrote:
| Leading comma is nice in SELECT statements because you can
| comment toggle individual lines. Indenting your code to make it
| more readable is basically what anyone with room temperature IQ
| does automatically. A lot of these other tips look like they're
| designed to deal with SQL design flaws, like how handling nulls
| isn't well defined in the spec so it's up to each implementation
| to do whatever it wants.
| willvarfar wrote:
| A lot of databases support trailing commas in select clauses.
|
| Which is just as well. I want to scratch my eyes out every time
| I see someone formatting with comma starting the lines. It's
| the kind of foolish consistency that is a big part of
| performative engineering.
| silveraxe93 wrote:
| > I want to scratch my eyes out every time I see someone
| formatting with comma starting the lines
|
| Right!? I _physically_ recoil every time I see that. I think
| that's the clearest example of normalisation of deviance [1]
| I know. Seems like anyone that enters the industry straight
| from data instead of moving from a more (software)
| engineering background gets used to this.
|
| And the arguments in favour are always so weak! - It's easier
| to comment out lines - Easier to not miss a comma
|
| Those are picked up in seconds by the compiler. And are a
| tiny help in writing code vs violating a core writing
| convention from basically every other language.
|
| [1]- https://danluu.com/wat/
| disgruntledphd2 wrote:
| I'm a data person and despite seeing this for years, still
| despise that approach to commas. Seriously, it's not that
| hard to comment out the damn comma.
| yen223 wrote:
| Postgres and postgres-likes (e.g. Redshift) notably don't
| support trailing commas in select clauses.
| hans_castorp wrote:
| > A lot of databases support trailing commas in select
| clauses.
|
| Which ones?
|
| Postgres, Oracle, SQL Server, MySQL, MariaDB and SQLite do
| not allow that.
| willvarfar wrote:
| I guess I'm being spoiled by BigQuery :)
|
| To be fair, BigQuery SQL is improving at quite a pace. If
| you follow their RSS, they are often announcing small but
| solid affordances like trailing commas, the new RANGE
| datatype, BigLake, some limited grouping and equality for
| arrays and structs, etc.
|
| It is also probable that they expose Google's new query
| pipe syntax. Currently there are some hints from the error
| messages in the console that it's behind a feature flag or
| something.
| the_gorilla wrote:
| The only ones that matter don't. If you care that much about
| seeing a comma somewhere you shouldn't, you have no real
| problems and shouldn't be complaining.
| GrumpyNl wrote:
| When you comment the first statement, that doesnt have the ",",
| it will break and you still have to remove the "," from the
| second line, so your comment is not valid.
| mulmen wrote:
| This is only true if you comment out the last line in the
| SELECT clause. It's ugly code and the justification doesn't
| pass the sniff test.
| sgarland wrote:
| Not shown: stop using SELECT *. You almost certainly do not need
| the entire width of the table, and by doing so, you add more data
| to filter and transmit, and also prevent semijoins, which are
| awesome.
| yen223 wrote:
| There are broadly two kinds of people who write SQL: analysts,
| and developers
|
| For developers, yeah. SELECT * has pitfalls, and you should
| almost always specify your columns or use a query builder that
| does that for you.
|
| For analysts though, life is short and sometimes you really
| don't want to type all the columns out. SELECT * is fine.
| higeorge13 wrote:
| Analysts usually query data warehouses, which are columnar,
| so * is a query/warehouse killer. Everybody should just
| select the columns they need.
| egeozcan wrote:
| I remember doing the "WHERE 1=1" trick in my last job and it
| causing a... let's say "unproductive", discussion in the pull-
| request.
| hot_gril wrote:
| What about `WHERE true`?
| abrookewood wrote:
| I don't understand the point at all. If you need to add some
| condition later on, why not just add it then? What benefit is
| there to just marking out the spot where you might add the
| condition at some point in the future?
| DH61AG wrote:
| It is a bit silly but I think it just helps with code
| readability some people.
| azthecx wrote:
| I personally don't use it too, but I think it's origins are
| not just readability, but from developing queries in a REPL
| like environment.
|
| As you develop and are constantly creating / debugging
| queries where you often add new _and_ or _or_ clauses as a
| whole line, that becomes much faster to add and remove
| those same lines as they 're a single shortcut away in
| nearly all text editors.
| paperplatter wrote:
| Yeah, so often do I have an EXPLAIN ANALYZE query.txt
| file I'm repeatedly editing in one window and piping into
| psql in another to try and make something faster. So I
| put WHERE true at the top.
| Izkata wrote:
| For their one here it's just the ability to rapidly
| comment/uncomment conditions in a query editor while
| exploring the data or debugging the query, and not having
| to worry about the leading AND or OR.
|
| I've also seen it in code with iterative adds, for example:
| for crit in criteria: sql += " AND " + crit
|
| No needed to add a sentinel or other logic to skip the
| first AND. I saw it a lot before people got used to " AND
| ".join(criteria).
| regexman1 wrote:
| That's right - it's just a quicker way of being able to
| comment/uncomment conditions when doing EDA or debugging.
| hot_gril wrote:
| Yeah. Especially when I'm trying to see what makes some
| query slow.
| Izkata wrote:
| I believe some database engines can be configured to error if
| you do that, and 1=1 doesn't trigger the safeguard.
| yen223 wrote:
| I've only recently learned that SQL Server doesn't have
| "true" or "false" (!)
| youdeet wrote:
| One more point in the "Anti Join". Use EXISTS instead of IN and
| LEFT JOIN if you only want to check existence of a row in another
| large table / subquery based on the conditions. EXISTS returns
| true as soon as it has found a hit. In case of LEFT JOIN and IN
| engine collects all results before evaluating.
| Semaphor wrote:
| Yeah, I was a bit confused there. In all my testing, (NOT)
| EXISTS was generating either a better plan or the same one as
| (LEFT) JOIN/(NOT) IN. In addition, it's also clearer what the
| intent is.
| alex5207 wrote:
| Never knew about QUALIFY. That's great
| leosanchez wrote:
| Looks like neither Postgres not SQL Server support QUALIFY
| jabagawee wrote:
| As I understand it, it's not part of the SQL standard.
| FoeNyx wrote:
| It was sadly hinted as a recent or non standard feature by
| Github's syntax highlighting not recognizing it either
| magicalhippo wrote:
| I'll add some of mine:
|
| Learn your DB server. Check the query plans often. You might get
| surprised. Tweak and recheck.
|
| Usually EXISTS is faster than IN. Beware that NOT EXISTS behaves
| differently than EXCEPT in regards to NULL values.
|
| Instead of joining tables and using distinct or similar to filter
| rows, consiser using subquery "columns", ie in SELECT list. This
| can be much faster even if you're pulling 10+ values from the
| same table, even if your database server supports lateral joins.
| Just make sure the subqueries return at most one row.
|
| Any query that's not a one-off should not perform any table
| scans. A table scan today can mean an outage tomorrow. Add
| indexes. Keep in mind GROUP BY clause usually dictates index use.
|
| If you need to filter on expressions, say where a substring is
| equal something, you can add a computed column and index on that.
| Alternatively some db's support indexing expressions directly.
|
| Often using UNION ALL can be much faster than using OR, even for
| non-trivial queries and/or multiple OR clauses.
|
| edit: You can JOIN subqueries. This can be useful to force the
| filtering order if the DB isn't being clever about the order.
| Semaphor wrote:
| > Any query that's not a one-off should not perform any table
| scans. A table scan today can mean an outage tomorrow.
|
| That very much depends on your data.
| magicalhippo wrote:
| I should have noted that I was talking about application
| workloads. I don't have much experience with analytics
| workloads.
|
| If you have something else in mind, do feel free to
| elaborate.
| Semaphor wrote:
| Relevant for applications as well, when a table only has a
| few thousand entries, a scan is not the end of the world
| and not even an outage in waiting.
|
| I agree with you that one should seek when possible as part
| of normal query optimization, but depending on your data,
| it could also just easily be something you can live with
| forever.
| hinkley wrote:
| You can't control the growth rate of your tables, you can
| only estimate it. When we design for reliability we want
| to remove single cause failures and make a best effort to
| reduce dual cause failures. We definitely don't want two
| failures from a single cause.
|
| What reason might the lack of indexes suddenly become a
| critical issue? And what other things might you be
| scrambling to deal with at the same time? Tables might
| fill quickly when a favorable review comes in, or some
| world even results in churn in your system.
|
| Just make the damned index. You Are Going to Need It. And
| what's the harm in making it?
| Semaphor wrote:
| Depending on your application, you can _very_ accurately
| estimate it. And in a case I had yesterday, it involved
| stringy numbers because of a third party system, so I
| could indeed add a computed persisted column that
| converts our number to a VARCHAR, add a 9th index with a
| lot of fields on that computed column and then save...
| almost nothing compared to just scanning 6k rows.
| magicalhippo wrote:
| We had one such table for years. The one day I get an
| emergency call from support, big customer don't get their
| responses and it's critical for their workflow.
|
| After some digging I found the service generating the
| responses got killed due to being unresponsive.
|
| Turns out our customer got a new client which caused them
| to suddenly generate 100x as much data as others in this
| module. And that caused a _lot_ more data in a table that
| joined this non-indexed table.
|
| So everything was working, it was just the performance
| went over a cliff in a matter of days due to the missing
| index.
|
| Added the required index and it's been humming ever
| since.
|
| I've had similar experiences, and so these days I'm very
| liberal with indexes.
|
| We have read-heavy workloads, if you mostly insert then
| sure be conservative.
| ahoka wrote:
| "And what's the harm in making it?"
|
| Increased storage and slower inserts?
| hinkley wrote:
| In a table you think isn't growing?? No.
| magicalhippo wrote:
| As usual, there are well-qualified exceptions. If you are
| very certain the table scan can't hurt, sure. But in my
| experience, an index wouldn't hurt any in those cases.
| idiocrat wrote:
| > Learn your DB server. Check the query plans often. You might
| get surprised. Tweak and recheck.
|
| Oftentimes the well-designed queries behave unexpectedly,
| because the column statistics are not updated or when the data
| is fragmented for big tables (e.g. random PK insertion).
| paperplatter wrote:
| Sounds like that DBMS would work better with serial int PKs
| hot_gril wrote:
| The most useful thing is learning your DBMS. There's no
| escaping the performance and isolation quirks of each one, and
| there are different bonus features in each.
|
| One interesting thing I found about Postgres that's probably
| true of others too, often you can manually shard INSERT (SELECT
| ...) operations to speed them up linearly with the number of
| CPU cores, even when you have like 10 joins. EXPLAIN first,
| find the innermost or outermost join, and kick off a separate
| parallel query operating on each range of rows (id >= start AND
| id < end). For weird reasons, I relied on this a lot for one
| job 6 years ago. Postgres has added parallelism in versions
| 10+, but it's still not this advanced afaik.
| arkh wrote:
| > Just make sure the subqueries return at most one row.
|
| The JSON functions most RDBMS offer are awesome for that. One
| subquery to get a JSON if you have multiple results for the
| field then you only have to decode it on the app side.
| code_biologist wrote:
| _Instead of joining tables and using distinct or similar to
| filter rows, consiser using subquery "columns", ie in SELECT
| list._
|
| What does this mean? Running SELECT
| column1, ( SELECT column2, column3, ...
| FROM table_b WHERE table_a.id = table_b.a_id
| ) FROM table_a
|
| Results in "subquery must return only one column" as I
| expected. You mean returning the multiple columns as a record /
| composite type?
|
| _Keep in mind GROUP BY clause usually dictates index use._
|
| The reason for this wasn't immediately apparent to me. For
| those who were curious, this blog post walks through it step by
| step: https://www.brentozar.com/archive/2015/06/indexing-for-
| group...
| magicalhippo wrote:
| Sorry, was on mobile so hadn't patience to type examples.
| SELECT column1, ( SELECT
| column2 FROM table_b WHERE table_a.id
| = table_b.a_id ) as b_column2, (
| SELECT column3 FROM table_b WHERE
| table_a.id = table_b.a_id ) as b_column3
| FROM table_a
|
| It might look like a lot more work, but in my experience it's
| usually a lot faster. YMMV but check it.
| dspillett wrote:
| How well that performs compared to a JOIN can vary
| massively depending on the data sizes of table_a & tale_b,
| how table_b is indexed, and what else is going on in the
| query.
|
| If table_b has an index on id,column2,column3 (or on id
| INLUDEing column2,column3) I would expect the equivalent
| JOIN to usually be faster. If you have a clustered index on
| Id (which is the case more often than not in MS SQL Server
| and MySQL/InnoDB) then that would count for this _unless_
| the table is much wider than those three columns (so the
| index with its selective data would get many rows per page
| more than the base data).
|
| Worst (and fairly common) case with sub-selects like that
| is the query planner deciding to run each subquery one per
| row from table_a. This is not an issue if you are only
| returning a few rows, or just one, from table_a, but in
| more complex examples (perhaps if this fragment is a CTE or
| view that is joined in a non-sargable manner so filtering
| predicates can't push down) you might find a lot more rows
| are processed this way even if few are eventually returned
| due to other filters.
|
| There are times when the method is definitely faster but be
| very careful with it (test with realistic data sizes and
| patterns) because often when it isn't, it _really_ isn 't.
| magicalhippo wrote:
| > perhaps if this fragment is a CTE or view
|
| Yeah I guess I should have specified that this technique
| usually works best when done in the outer query, not
| buried deep inside.
|
| It can be particularly effective if you fetch partial
| results, ie due to pagination or similar.
|
| That said, these things aren't set in stone. I shared my
| experience, but my first tip goes first :)
| dspillett wrote:
| _> > Keep in mind GROUP BY clause usually dictates index use.
|
| > The reason for this wasn't immediately apparent to me._
|
| The key thing to remember is that grouping _is_ essentially a
| sorting operation, and it happens before your other sorts
| (that last part isn 't necessarily as obvious).
| hans_castorp wrote:
| > Any query that's not a one-off should not perform any table
| scans. A table scan today can mean an outage tomorrow
|
| I disagree.
|
| There are queries where a table scan is the most efficient
| access strategy. These are typically analytical/aggregation
| queries that usually query the whole table. And sometimes
| getting only 50% of all rows is better done using a table scan
| as well.
|
| I also don't see how a (read only) "table scan" could leave to
| an outage. It won't block concurrent access. The only drawback
| is that it results in a higher I/O load - but if the server
| can't handle that, it would assume it's massively undersized.
| magicalhippo wrote:
| I mentioned in a different reply that I did not have analytic
| queries in mind. I don't work with that so forgot to specify.
|
| Outage might "just" mean slow enough that customer can't get
| their work done in time. For the customer it's the same.
| silveraxe93 wrote:
| The "readability" section has 3 examples. The first 2 are
| literally sacrificing readability so it's easier to write, and
| the last has an unreadable abomination that indenting is really
| not doing much.
| yen223 wrote:
| I'm not the biggest fan of how the first two conventions look,
| but they are real conventions used by real SQL people. And I
| can understand why they exist.
|
| I've seen them enough to not be bothered by them any more.
| silveraxe93 wrote:
| Yeah, unfortunately you're right that they are real
| conventions. Quite common too.
|
| I also _understand_ why they exist. It's simple: It makes
| code marginally easier to write.
|
| But writing confusing, unintuitive and honestly plain ugly
| code. Just so you can save a second after clicking run and
| the compiler tells you the mistake is a bad reason.
| arp242 wrote:
| A lot of "readability" depends on what you're used to and
| what you expect. I don't think these conventions are
| inherently "ugly" or "confusing", but they are different to
| what I've been doing for a long time, and thus unexpected,
| and thus "ugly". But that's extremely subjective.
|
| I've done plenty of SQL, and I've regularly run in to the
| "fuck about with fucking trailing commas until it's valid
| syntax"-problem. It's a very reasonable convention to have.
|
| What should really happen is that the SQL standard should
| allow trailing commas: select a,
| b, from t;
| regexman1 wrote:
| That's a totally valid point haha.
| silveraxe93 wrote:
| I'll try to give some constructive criticism instead of a
| drive by pot shot. I'm sorry, it's just that the leading
| commas make my eyes bleed and I really hope the industry
| moves away from it.
|
| On point 3: What I do is use CTEs to create intermediate
| columns (with good names) and then a final one creating the
| final column. It's way more readable.
|
| ```sql
|
| with intermediate as (
|
| select DATEDIFF(DAY, timeslot_date,
| CURRENT_DATE()) > 7 as days_7_difference,
| DATEDIFF(DAY, timeslot_date, CURRENT_DATE()) >= 29 as
| days_29_difference, LAG(overnight_fta_share, 1)
| OVER (PARTITION BY timeslot_date, timeslot_channel ORDER BY
| timeslot_activity) as overnight_fta_share_1_lag,
| LAG(overnight_fta_share, 2) OVER (PARTITION BY timeslot_date,
| timeslot_channel ORDER BY timeslot_activity)as
| overnight_fta_share_2_lag
|
| from timeslot_data)
|
| select iff(days_7_difference,
| overnight_fta_share_1_lag, null) as C7_fta_share,
| iff(days_29_difference, overnight_fta_share_2_lag, null) as
| C28_fta_share
|
| from intermediate ```
| regexman1 wrote:
| I appreciate the feedback, no offence taken. I'm an analyst
| so I often find the leading comma useful when I'm testing
| something and want to quickly comment a column out but I
| take your point.
|
| And I agree, I should have used CTEs for this query, I was
| just trying to save lines of code which had the unintended
| consequence of quite an ugly query. However I did want to
| use it as an example of indentation being useful to make it
| slightly easier to read. Although perhaps I'm the only one
| who thinks so.
|
| I greatly appreciate the constructive criticism.
| swarnie wrote:
| Alternatively, write a mess of SQL like a three year old child
| that just discovered MSPaint then push the "beautifier" button
| and knock off for an early lunch.
| chipdart wrote:
| > The first 2 are literally sacrificing readability so it's
| easier to write, (...)
|
| The leading comma format brings benefits beyond readability.
| For example, in version control systems the single-argument-
| per-line-with-leading-comma format turns any change to those
| arguments as a one-line diff.
|
| I think developers spend as much time looking at commit
| historyas they do to the actual source code.
| hinkley wrote:
| If you're still using a diff tool that can't do sub-line
| diffs it's time to join the 20's. I haven't been forced to
| use one of those in over ten years.
| chipdart wrote:
| > If you're still using a diff tool that can't do sub-line
| diffs it's time to join the 20's.
|
| I think you failed to understand what I wrote.
|
| Leading comma ensures one line diffs, but trailing comma
| forces two-line diffs when you add a trailing argument.
| With trailing comma, you need to touch the last line to add
| a comma, and then add the new argument in the line below.
|
| We are not discussing bundling all arguments in a single
| line. I don't know where you got that idea from.
| hinkley wrote:
| What's the value in doing this unless it makes the diff
| clearer?
|
| It only makes the diff clearer if you don't have single
| character highlighting in your diff tool. Which most have
| now. Have had for a decade.
|
| Also it's not going to be a single line anyway. You add a
| line to the query and one to the caller. At a minimum. So
| you're really arguing for three versus four. Which is
| false economy.
| chipdart wrote:
| > What's the value in doing this unless it makes the diff
| clearer?
|
| Because it makes the diff clearer.
|
| Are you even reading the posts you're replying to?
| orbital223 wrote:
| > Leading comma ensures one line diffs
|
| It does not. It just moves the edge case to a different
| position: trailing comma has the "issue" when adding an
| argument to the end of the list while leading comma has
| it when adding an argument to the beginning.
|
| Also, as pointed out by the other commenter, any decent
| modern diff tool will make it obvious that the change to
| the existing line is just the addition of a comma, which
| makes the difference basically moot.
| hinkley wrote:
| Who splits column per line in the SELECT block and still leave
| 150 character wide lines? This is a fucked up definition of
| legibility. I can't even get started on the commas.
|
| NOBODY CHECKS LONG LINES IN CODE REVIEWS. That was the biggest
| problem with AngularJS. People mishandling merges and breaking
| everything because the eyes start to glaze over at column 90.
| I've been on more than half a dozen teams with CRs and it's
| always the same. I'm exquisitely aware of this and try not to
| do it, and I still fuck it up half as often as the next person.
|
| Split your shit up. Especially when trying to set an example
| for others.
| hyperman1 wrote:
| This could be a great comment if the tone was different. I'll
| try to give my perspective.
|
| SQL, unfortunately, is very verbose and has a strange mix of
| super-high and very low abstraction. There is also no SQL
| formatter out there that does a decent job, and no real
| consensus about how good SQL is supposed to look.
|
| If I look at the 'indent' guideline, it contains e.g.:
| , IFF(DATEDIFF(DAY, timeslot_date, CURRENT_DATE()) >= 29,
| LAG(overnight_fta_share, 2) OVER (PARTITION BY timeslot_date,
| timeslot_channel ORDER BY timeslot_activity), NULL)
| AS C28_fta_share
|
| Immediate SQL failures: 1) it has no easy facility to pull
| that DATEDIFF clause in a different variable/field. 2) The
| LAG line is verbose, especially if your DB doesn't allow to
| pull out the WINDOW clause.
| ahoka wrote:
| Why do you think its worse
|
| ? I don't see any problems
|
| , or anything wrong with it
|
| .
| Semaphor wrote:
| Regarding "Comment your code!": At least for MSSQL, it's often
| recommended not to use -- for comments but instead /**/, because
| many features like the query store save queries without line
| breaks, so if you get the query from there, you need to manually
| fix everything instead of simply using your IDEs formatter.
| regexman1 wrote:
| I didn't realise that, great to know. Thanks!
| petters wrote:
| That sounds like a bug in the query store
| AtNightWeCode wrote:
| Never use WHERE 1=1. It is both a security risk and a performance
| risk to run dynamic ad-hoc queries.
| Gunax wrote:
| What is a dynamic, adhoc query? Why does adding 1=1 support
| that?
| abrookewood wrote:
| I'm wondering that as well. I don't get that suggestion at
| all.
| egormakarov wrote:
| Lets say its 2001 and you are writing some hot e-commerce
| stuff in plain php. You want to filter data depending on
| multiple fields in the submitted form. If some field is
| there, you add one more "AND" clause to the "WHERE", like
| this: if (isset($_POST['product'])) { $query .= "AND product
| = " . $_POST['product']; }. So in order not to check every
| time if the added clause is the first one you start with
| "WHERE 1=1 ", as "WHERE AND ..." would not work.
| freilanzer wrote:
| Php has nothing like this?
|
| In [1]: "... WHERE " + " AND ".join(str(i) for i in
| range(4))
|
| Out[1]: '... WHERE 0 AND 1 AND 2 AND 3'
|
| Very strange.
| egormakarov wrote:
| This will produce broken SQL on empty clauses list. Very
| strange.
| paperplatter wrote:
| I get how this isn't good. But how else would you handle
| multi-field filtering, keep all the ANDs and use
| (product_id = $1 OR $1 IS NULL) so the unset filters are
| no-op? That's ok as long as the query planner is smart
| enough.
| AtNightWeCode wrote:
| In this case. A query that you build by adding different
| strings. 1=1 is for adding AND statements to the WHERE clause
| dynamically. In your code. I never seen it used for anything
| else. Adhoc is just the practice of running raw SQL queries.
|
| So you end up with things like this.
|
| "SELECT * FROM Music WHERE 1=1" + "AND category='rock'"
|
| The risk is now that you by mistake allow for SQL-injections
| but also every genre will generate a different query plan.
| Depending on what SQL engine you use this may hurt
| performance.
|
| And one would think that this is a thing of the past. But it
| is not.
| adamzochowski wrote:
| Can you expand on this? How is having WHERE
| 1=1 AND ...[usual-where-clause]...
|
| A performance and security compared to doing
| WHERE ...[usual-where-clause]...
| AtNightWeCode wrote:
| If you use it in the same way you use trailing commas. Fair.
| But the site says to make it easier to add dynamic
| conditions. Which is a terrible idea in maybe not all but
| many SQL engines.
| jcz_nz wrote:
| Can you elaborate on security issues here?
| thestepafter wrote:
| I think that it means the reason for doing where 1 = 1 is
| sometimes to allow for easy insertion of dynamic queries
| which can be a security and performance issue. The actual
| usage of where 1 = 1 doesn't cause the security or
| performance issue.
| AtNightWeCode wrote:
| Which is exactly what the site says. To insert dynamic
| conditions. I know that you can use 1=1 for the same
| reasons as trailing commas. But kinda obvious that this is
| not the case here.
| regexman1 wrote:
| I'll add this as a caveat. I'm an analyst so my SQL isn't
| really exposed to anyone other than myself and so I wasn't
| aware of this, thanks for flagging.
| halayli wrote:
| A random person claims adding 1=1 is a security risk and you
| are going to add it as caveat without verifying if the claim
| is true nor knowing why? That's how misinformation spreads
| around.
|
| OP doesn't know what they are talking about because adding
| 1=1 is not a security risk. 1=1 is related to sql injections
| where a malicious attacker injects 'OR 1=1' into the end of
| the where clause to disable the where clause completely. OP
| probably saw '1=1' and threw that into the comment.
| regexman1 wrote:
| Fair point!
| AtNightWeCode wrote:
| Read my other comments. I worked with SQL on and off since
| the last century. It has nothing to do with your poor
| assumptions.
| the_gorilla wrote:
| Duration of working with SQL doesn't matter. The better
| SQL programmers don't do it specifically, and have
| experience in real languages that they bring over to
| database queries.
| paperplatter wrote:
| 1=1 is not a security risk
| ricardo81 wrote:
| Presumably you are thinking about queries in code that add
| WHERE clauses dynamically that aren't escaped correctly- which
| doesn't have to be the case.
|
| 1 = 1 is at least handy for simply joining a variadic amount of
| other clauses with ' AND ' rather than counting if there's any
| to add at all.
| AtNightWeCode wrote:
| Yes. "Use a dummy value in the WHERE clause so you can
| _dynamically_ add and remove conditions with ease: " I don't
| know how to read this in another way.
| DH61AG wrote:
| This doesn't make any sense at all.
| philippta wrote:
| I really like the formatting presented in this article:
|
| https://www.sqlstyle.guide/#spaces
| elchief wrote:
| use sqlfluff linter and do what it says
| l5870uoo9y wrote:
| And I take it CTEs are implicitly being discouraged.
| regexman1 wrote:
| Not at all actually, I just hadn't really planned to add this
| as a tip. Additionally I thought an in-line view was fine for
| the examples included. But maybe I will!
| higeorge13 wrote:
| It used to be like this (i remember in past postgres versions
| CTEs had worse performance than subqueries), but not anymore.
| wodenokoto wrote:
| Everybody is up in arms about the comma suggestion but everyone
| thinks the 1=1 is a good idea in the where clause? If I saw that
| in a code review I don't know what I'd think of the author.
| AtNightWeCode wrote:
| You can motivate it with the same reasons as trailing commas.
| Making code reviews easier since changes to WHERE statements
| does not effect other lines. But if the reason is, as in this
| case to be able to add dynamic conditions. You will for sure be
| fired where I work.
| dspillett wrote:
| On readability, I often find aligning things in two columns is
| more readable. To modify the two examples in TFA:
| SELECT e.employee_id , e.employee_name
| , e.job , e.salary FROM employees e
| WHERE 1=1 -- Dummy value. AND e.job IN ('Clerk',
| 'Manager') AND e.dept_no != 5 ;
|
| and with a JOIN: SELECT e.employee_id
| , e.employee_name , e.job , e.salary
| , d.name , d.location FROM employees e
| JOIN departments d ON d.dept_no = e.dept_no
| WHERE 1=1 -- Dummy value. AND e.job IN ('Clerk',
| 'Manager') AND e.dept_no != 5 ;
|
| In the join example, for a simple ON clause like that I'll
| usually just have JOIN ... ON in the one line, but if there are
| multiple conditions they are usually clearer on separate lines
| IMO.
|
| In more complicated queries I might further indent the joins too,
| like: SELECT * FROM employees e
| JOIN departments d ON d.dept_no = e.dept_no
| WHERE 1=1 -- Dummy value. AND e.job IN ('Clerk',
| 'Manager') AND e.dept_no != 5 ;
|
| YMMV. Some people strongly agree with me here, others vehemently
| hate the way I align such code...
|
| WRT "Always specify which column belongs to which table": this is
| particularly important for correlated sub-queries, because if you
| put the wrong column name in and it happens to match a name in an
| object in the outer query you have a potentially hard to find
| error. Also, if the table in the inner query is updated to
| include a column of the same name as the one you are filtering on
| in the outer, the meaning of your sub-query suddenly changes
| quite drastically without it having changed itself.
|
| A few other things off the top of my head:
|
| 1. Remember that as well as UNION [ALL], EXCEPT and INTERSECT
| exist. I've seen (and even written myself) some horrendous SQL
| that badly implements these behaviours. TFA covers EXCEPT, but I
| find people who know about that don't always know about
| INTERSECT. It is rarely useful IME, but when it is useful it is
| really useful.
|
| 2. UPDATEs that change nothing still do everything else: create
| entries in your transaction log (could be an issue if using log-
| shipping for backups or read-only replicas etc.), fire triggers,
| create history rows if using system-versioned tables, and so
| forth. UPDATE a_table SET a_column = 'a value' WHERE a_column <>
| 'a value' can be a lot faster than without the WHERE.
|
| 3. Though of course be very careful with NULLable columns and/or
| setting a value NULL with point 2. "WHERE a_column IS DISTINCT
| FROM 'a value'" is much more maintainable if your DB supports
| that syntax (added in MS SQL Server 2022 and Azure SQL DB a
| little earlier, supported by Postgres years before, I don't know
| about other DBs without checking) than the more verbose
| alternatives.
|
| 4. Trying to force the sort order of NULLs with something like
| "ORDER BY ISNULL(a_column, 0)", or doing similar with GROUP BY,
| can be very inefficient in some cases. If you expect few rows to
| be returned and there are relatively few NULLs in the sort target
| column it can be more performant to SELECT the non-NULL case and
| the NULL case then UNION ALL the two and then sort. Though if you
| do expect many rows this can backfire badly and you and up with
| excess spooling to disk, so test, test, and test again, when
| hacking around like this.
| AtNightWeCode wrote:
| A common mistake I see is that people think foreign keys will
| automatically create indexes. Missing indexes is a general
| problem in SQL. Missing indexes on columns that are in foreign
| keys are even worse.
| mergisi wrote:
| Great post! If you're looking to speed up your SQL queries, you
| might want to check out AI2sql https://ai2sql.io/. It can
| generate SQL queries quickly from plain English prompts, which
| can be super helpful when you're in a rush or dealing with
| complex queries. Definitely worth giving a try for anyone looking
| to streamline their workflow!
| rawgabbit wrote:
| My tips for working with complex Stored Procedures.
|
| 1. At the beginning of the proc, immediately copy any permanent
| tables into temporary tables and specify/limit/filter only for
| the rows you need.
|
| 2. In the middle of the proc, manipulate the temporary tables as
| needed.
|
| 3. At the end of the proc, update the permanent tables enclosed
| within a transaction. Immediately rollback transaction/exit the
| proc, if an error is detected. (By following all three steps,
| this will improve concurrency and lets you restart the proc
| without manually cleaning up any data messes).
|
| 4. Use extreme caution when working with remote tables. Remote
| tables do not reside in your RDBMS and most likely will not
| utilize any statistics/indexes your RDBMS has. In many cases, it
| is more performant to dump/copy the entire remote table into a
| temporary table and then work with that. The most you can expect
| from a remote table is to execute a Where clause. If you attempt
| Joins or something complicated, it will likely timeout.
|
| 5. The Query Plan is easily confused. In some cases, the Query
| Plan will resort to perform row by row processing which will
| bring performance to a halt. In many cases, it is better to break
| up a complex stored procedure into smaller steps using temporary
| tables.
|
| 6. Always check the Query Plan to see what the RDBMS is actually
| doing.
| remus wrote:
| 1-3 are nice if you can guarantee your data is reasonably
| sized, but if it gets too big for your hardware taking copies
| of large datasets and then doing updates on large datasets can
| add a lot of overhead.
| wvenable wrote:
| I've significantly improved the performance of queries by
| undoing someone who did #5 when it wasn't strictly needed.
| Sometimes breaking a query into many smaller queries is
| significantly less efficient than giving the query optimizer
| the entire query and letting it find the best route to the
| data.
|
| If you've done #5 without doing #6 then you'll likely not see
| that you're doing something not optimal. My advice is avoid
| premature optimization and do things the most straight forward
| way first and then only optimize if needed. Most importantly,
| don't code in SQL procedurally -- you're describing the data
| you want not giving the engine instructions on how to get it.
| the_gorilla wrote:
| I hate having to use a bunch of temp tables, but I regularly
| run into queries that would never finish if you let the query
| planner do its thing. Like compilers their ability is highly
| overrated. Meanwhile, microsoft places constant warnings
| against trying to even tune their query planner because it
| supposedly knows best.
| shmerl wrote:
| I don't get the point of the dummy value. How does it help doing
| anything? I can add conditions with ease without it.
___________________________________________________________________
(page generated 2024-09-25 23:00 UTC)