[HN Gopher] SQL Tips and Tricks
       ___________________________________________________________________
        
       SQL Tips and Tricks
        
       Author : regexman1
       Score  : 199 points
       Date   : 2024-09-25 04:20 UTC (18 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | dooer wrote:
       | I am bad at SQL so this is great
        
       | isoprophlex wrote:
       | Wow, that EXCEPT trick is neat! ~10 years of using SQL almost
       | daily, and I never knew...
        
       | the_gorilla wrote:
       | Leading comma is nice in SELECT statements because you can
       | comment toggle individual lines. Indenting your code to make it
       | more readable is basically what anyone with room temperature IQ
       | does automatically. A lot of these other tips look like they're
       | designed to deal with SQL design flaws, like how handling nulls
       | isn't well defined in the spec so it's up to each implementation
       | to do whatever it wants.
        
         | willvarfar wrote:
         | A lot of databases support trailing commas in select clauses.
         | 
         | Which is just as well. I want to scratch my eyes out every time
         | I see someone formatting with comma starting the lines. It's
         | the kind of foolish consistency that is a big part of
         | performative engineering.
        
           | silveraxe93 wrote:
           | > I want to scratch my eyes out every time I see someone
           | formatting with comma starting the lines
           | 
           | Right!? I _physically_ recoil every time I see that. I think
           | that's the clearest example of normalisation of deviance [1]
           | I know. Seems like anyone that enters the industry straight
           | from data instead of moving from a more (software)
           | engineering background gets used to this.
           | 
           | And the arguments in favour are always so weak! - It's easier
           | to comment out lines - Easier to not miss a comma
           | 
           | Those are picked up in seconds by the compiler. And are a
           | tiny help in writing code vs violating a core writing
           | convention from basically every other language.
           | 
           | [1]- https://danluu.com/wat/
        
             | disgruntledphd2 wrote:
             | I'm a data person and despite seeing this for years, still
             | despise that approach to commas. Seriously, it's not that
             | hard to comment out the damn comma.
        
           | yen223 wrote:
           | Postgres and postgres-likes (e.g. Redshift) notably don't
           | support trailing commas in select clauses.
        
           | hans_castorp wrote:
           | > A lot of databases support trailing commas in select
           | clauses.
           | 
           | Which ones?
           | 
           | Postgres, Oracle, SQL Server, MySQL, MariaDB and SQLite do
           | not allow that.
        
             | willvarfar wrote:
             | I guess I'm being spoiled by BigQuery :)
             | 
             | To be fair, BigQuery SQL is improving at quite a pace. If
             | you follow their RSS, they are often announcing small but
             | solid affordances like trailing commas, the new RANGE
             | datatype, BigLake, some limited grouping and equality for
             | arrays and structs, etc.
             | 
             | It is also probable that they expose Google's new query
             | pipe syntax. Currently there are some hints from the error
             | messages in the console that it's behind a feature flag or
             | something.
        
           | the_gorilla wrote:
           | The only ones that matter don't. If you care that much about
           | seeing a comma somewhere you shouldn't, you have no real
           | problems and shouldn't be complaining.
        
         | GrumpyNl wrote:
         | When you comment the first statement, that doesnt have the ",",
         | it will break and you still have to remove the "," from the
         | second line, so your comment is not valid.
        
         | mulmen wrote:
         | This is only true if you comment out the last line in the
         | SELECT clause. It's ugly code and the justification doesn't
         | pass the sniff test.
        
       | sgarland wrote:
       | Not shown: stop using SELECT *. You almost certainly do not need
       | the entire width of the table, and by doing so, you add more data
       | to filter and transmit, and also prevent semijoins, which are
       | awesome.
        
         | yen223 wrote:
         | There are broadly two kinds of people who write SQL: analysts,
         | and developers
         | 
         | For developers, yeah. SELECT * has pitfalls, and you should
         | almost always specify your columns or use a query builder that
         | does that for you.
         | 
         | For analysts though, life is short and sometimes you really
         | don't want to type all the columns out. SELECT * is fine.
        
           | higeorge13 wrote:
           | Analysts usually query data warehouses, which are columnar,
           | so * is a query/warehouse killer. Everybody should just
           | select the columns they need.
        
       | egeozcan wrote:
       | I remember doing the "WHERE 1=1" trick in my last job and it
       | causing a... let's say "unproductive", discussion in the pull-
       | request.
        
         | hot_gril wrote:
         | What about `WHERE true`?
        
           | abrookewood wrote:
           | I don't understand the point at all. If you need to add some
           | condition later on, why not just add it then? What benefit is
           | there to just marking out the spot where you might add the
           | condition at some point in the future?
        
             | DH61AG wrote:
             | It is a bit silly but I think it just helps with code
             | readability some people.
        
             | azthecx wrote:
             | I personally don't use it too, but I think it's origins are
             | not just readability, but from developing queries in a REPL
             | like environment.
             | 
             | As you develop and are constantly creating / debugging
             | queries where you often add new _and_ or _or_ clauses as a
             | whole line, that becomes much faster to add and remove
             | those same lines as they 're a single shortcut away in
             | nearly all text editors.
        
               | paperplatter wrote:
               | Yeah, so often do I have an EXPLAIN ANALYZE query.txt
               | file I'm repeatedly editing in one window and piping into
               | psql in another to try and make something faster. So I
               | put WHERE true at the top.
        
             | Izkata wrote:
             | For their one here it's just the ability to rapidly
             | comment/uncomment conditions in a query editor while
             | exploring the data or debugging the query, and not having
             | to worry about the leading AND or OR.
             | 
             | I've also seen it in code with iterative adds, for example:
             | for crit in criteria:           sql += " AND " + crit
             | 
             | No needed to add a sentinel or other logic to skip the
             | first AND. I saw it a lot before people got used to " AND
             | ".join(criteria).
        
               | regexman1 wrote:
               | That's right - it's just a quicker way of being able to
               | comment/uncomment conditions when doing EDA or debugging.
        
               | hot_gril wrote:
               | Yeah. Especially when I'm trying to see what makes some
               | query slow.
        
           | Izkata wrote:
           | I believe some database engines can be configured to error if
           | you do that, and 1=1 doesn't trigger the safeguard.
        
           | yen223 wrote:
           | I've only recently learned that SQL Server doesn't have
           | "true" or "false" (!)
        
       | youdeet wrote:
       | One more point in the "Anti Join". Use EXISTS instead of IN and
       | LEFT JOIN if you only want to check existence of a row in another
       | large table / subquery based on the conditions. EXISTS returns
       | true as soon as it has found a hit. In case of LEFT JOIN and IN
       | engine collects all results before evaluating.
        
         | Semaphor wrote:
         | Yeah, I was a bit confused there. In all my testing, (NOT)
         | EXISTS was generating either a better plan or the same one as
         | (LEFT) JOIN/(NOT) IN. In addition, it's also clearer what the
         | intent is.
        
       | alex5207 wrote:
       | Never knew about QUALIFY. That's great
        
         | leosanchez wrote:
         | Looks like neither Postgres not SQL Server support QUALIFY
        
           | jabagawee wrote:
           | As I understand it, it's not part of the SQL standard.
        
           | FoeNyx wrote:
           | It was sadly hinted as a recent or non standard feature by
           | Github's syntax highlighting not recognizing it either
        
       | magicalhippo wrote:
       | I'll add some of mine:
       | 
       | Learn your DB server. Check the query plans often. You might get
       | surprised. Tweak and recheck.
       | 
       | Usually EXISTS is faster than IN. Beware that NOT EXISTS behaves
       | differently than EXCEPT in regards to NULL values.
       | 
       | Instead of joining tables and using distinct or similar to filter
       | rows, consiser using subquery "columns", ie in SELECT list. This
       | can be much faster even if you're pulling 10+ values from the
       | same table, even if your database server supports lateral joins.
       | Just make sure the subqueries return at most one row.
       | 
       | Any query that's not a one-off should not perform any table
       | scans. A table scan today can mean an outage tomorrow. Add
       | indexes. Keep in mind GROUP BY clause usually dictates index use.
       | 
       | If you need to filter on expressions, say where a substring is
       | equal something, you can add a computed column and index on that.
       | Alternatively some db's support indexing expressions directly.
       | 
       | Often using UNION ALL can be much faster than using OR, even for
       | non-trivial queries and/or multiple OR clauses.
       | 
       | edit: You can JOIN subqueries. This can be useful to force the
       | filtering order if the DB isn't being clever about the order.
        
         | Semaphor wrote:
         | > Any query that's not a one-off should not perform any table
         | scans. A table scan today can mean an outage tomorrow.
         | 
         | That very much depends on your data.
        
           | magicalhippo wrote:
           | I should have noted that I was talking about application
           | workloads. I don't have much experience with analytics
           | workloads.
           | 
           | If you have something else in mind, do feel free to
           | elaborate.
        
             | Semaphor wrote:
             | Relevant for applications as well, when a table only has a
             | few thousand entries, a scan is not the end of the world
             | and not even an outage in waiting.
             | 
             | I agree with you that one should seek when possible as part
             | of normal query optimization, but depending on your data,
             | it could also just easily be something you can live with
             | forever.
        
               | hinkley wrote:
               | You can't control the growth rate of your tables, you can
               | only estimate it. When we design for reliability we want
               | to remove single cause failures and make a best effort to
               | reduce dual cause failures. We definitely don't want two
               | failures from a single cause.
               | 
               | What reason might the lack of indexes suddenly become a
               | critical issue? And what other things might you be
               | scrambling to deal with at the same time? Tables might
               | fill quickly when a favorable review comes in, or some
               | world even results in churn in your system.
               | 
               | Just make the damned index. You Are Going to Need It. And
               | what's the harm in making it?
        
               | Semaphor wrote:
               | Depending on your application, you can _very_ accurately
               | estimate it. And in a case I had yesterday, it involved
               | stringy numbers because of a third party system, so I
               | could indeed add a computed persisted column that
               | converts our number to a VARCHAR, add a 9th index with a
               | lot of fields on that computed column and then save...
               | almost nothing compared to just scanning 6k rows.
        
               | magicalhippo wrote:
               | We had one such table for years. The one day I get an
               | emergency call from support, big customer don't get their
               | responses and it's critical for their workflow.
               | 
               | After some digging I found the service generating the
               | responses got killed due to being unresponsive.
               | 
               | Turns out our customer got a new client which caused them
               | to suddenly generate 100x as much data as others in this
               | module. And that caused a _lot_ more data in a table that
               | joined this non-indexed table.
               | 
               | So everything was working, it was just the performance
               | went over a cliff in a matter of days due to the missing
               | index.
               | 
               | Added the required index and it's been humming ever
               | since.
               | 
               | I've had similar experiences, and so these days I'm very
               | liberal with indexes.
               | 
               | We have read-heavy workloads, if you mostly insert then
               | sure be conservative.
        
               | ahoka wrote:
               | "And what's the harm in making it?"
               | 
               | Increased storage and slower inserts?
        
               | hinkley wrote:
               | In a table you think isn't growing?? No.
        
               | magicalhippo wrote:
               | As usual, there are well-qualified exceptions. If you are
               | very certain the table scan can't hurt, sure. But in my
               | experience, an index wouldn't hurt any in those cases.
        
         | idiocrat wrote:
         | > Learn your DB server. Check the query plans often. You might
         | get surprised. Tweak and recheck.
         | 
         | Oftentimes the well-designed queries behave unexpectedly,
         | because the column statistics are not updated or when the data
         | is fragmented for big tables (e.g. random PK insertion).
        
           | paperplatter wrote:
           | Sounds like that DBMS would work better with serial int PKs
        
         | hot_gril wrote:
         | The most useful thing is learning your DBMS. There's no
         | escaping the performance and isolation quirks of each one, and
         | there are different bonus features in each.
         | 
         | One interesting thing I found about Postgres that's probably
         | true of others too, often you can manually shard INSERT (SELECT
         | ...) operations to speed them up linearly with the number of
         | CPU cores, even when you have like 10 joins. EXPLAIN first,
         | find the innermost or outermost join, and kick off a separate
         | parallel query operating on each range of rows (id >= start AND
         | id < end). For weird reasons, I relied on this a lot for one
         | job 6 years ago. Postgres has added parallelism in versions
         | 10+, but it's still not this advanced afaik.
        
         | arkh wrote:
         | > Just make sure the subqueries return at most one row.
         | 
         | The JSON functions most RDBMS offer are awesome for that. One
         | subquery to get a JSON if you have multiple results for the
         | field then you only have to decode it on the app side.
        
         | code_biologist wrote:
         | _Instead of joining tables and using distinct or similar to
         | filter rows, consiser using subquery "columns", ie in SELECT
         | list._
         | 
         | What does this mean? Running                   SELECT
         | column1,           (             SELECT column2, column3, ...
         | FROM table_b             WHERE table_a.id = table_b.a_id
         | )         FROM table_a
         | 
         | Results in "subquery must return only one column" as I
         | expected. You mean returning the multiple columns as a record /
         | composite type?
         | 
         |  _Keep in mind GROUP BY clause usually dictates index use._
         | 
         | The reason for this wasn't immediately apparent to me. For
         | those who were curious, this blog post walks through it step by
         | step: https://www.brentozar.com/archive/2015/06/indexing-for-
         | group...
        
           | magicalhippo wrote:
           | Sorry, was on mobile so hadn't patience to type examples.
           | SELECT           column1,           (             SELECT
           | column2             FROM table_b             WHERE table_a.id
           | = table_b.a_id           ) as b_column2,           (
           | SELECT column3             FROM table_b             WHERE
           | table_a.id = table_b.a_id           ) as b_column3
           | FROM table_a
           | 
           | It might look like a lot more work, but in my experience it's
           | usually a lot faster. YMMV but check it.
        
             | dspillett wrote:
             | How well that performs compared to a JOIN can vary
             | massively depending on the data sizes of table_a & tale_b,
             | how table_b is indexed, and what else is going on in the
             | query.
             | 
             | If table_b has an index on id,column2,column3 (or on id
             | INLUDEing column2,column3) I would expect the equivalent
             | JOIN to usually be faster. If you have a clustered index on
             | Id (which is the case more often than not in MS SQL Server
             | and MySQL/InnoDB) then that would count for this _unless_
             | the table is much wider than those three columns (so the
             | index with its selective data would get many rows per page
             | more than the base data).
             | 
             | Worst (and fairly common) case with sub-selects like that
             | is the query planner deciding to run each subquery one per
             | row from table_a. This is not an issue if you are only
             | returning a few rows, or just one, from table_a, but in
             | more complex examples (perhaps if this fragment is a CTE or
             | view that is joined in a non-sargable manner so filtering
             | predicates can't push down) you might find a lot more rows
             | are processed this way even if few are eventually returned
             | due to other filters.
             | 
             | There are times when the method is definitely faster but be
             | very careful with it (test with realistic data sizes and
             | patterns) because often when it isn't, it _really_ isn 't.
        
               | magicalhippo wrote:
               | > perhaps if this fragment is a CTE or view
               | 
               | Yeah I guess I should have specified that this technique
               | usually works best when done in the outer query, not
               | buried deep inside.
               | 
               | It can be particularly effective if you fetch partial
               | results, ie due to pagination or similar.
               | 
               | That said, these things aren't set in stone. I shared my
               | experience, but my first tip goes first :)
        
           | dspillett wrote:
           | _> > Keep in mind GROUP BY clause usually dictates index use.
           | 
           | > The reason for this wasn't immediately apparent to me._
           | 
           | The key thing to remember is that grouping _is_ essentially a
           | sorting operation, and it happens before your other sorts
           | (that last part isn 't necessarily as obvious).
        
         | hans_castorp wrote:
         | > Any query that's not a one-off should not perform any table
         | scans. A table scan today can mean an outage tomorrow
         | 
         | I disagree.
         | 
         | There are queries where a table scan is the most efficient
         | access strategy. These are typically analytical/aggregation
         | queries that usually query the whole table. And sometimes
         | getting only 50% of all rows is better done using a table scan
         | as well.
         | 
         | I also don't see how a (read only) "table scan" could leave to
         | an outage. It won't block concurrent access. The only drawback
         | is that it results in a higher I/O load - but if the server
         | can't handle that, it would assume it's massively undersized.
        
           | magicalhippo wrote:
           | I mentioned in a different reply that I did not have analytic
           | queries in mind. I don't work with that so forgot to specify.
           | 
           | Outage might "just" mean slow enough that customer can't get
           | their work done in time. For the customer it's the same.
        
       | silveraxe93 wrote:
       | The "readability" section has 3 examples. The first 2 are
       | literally sacrificing readability so it's easier to write, and
       | the last has an unreadable abomination that indenting is really
       | not doing much.
        
         | yen223 wrote:
         | I'm not the biggest fan of how the first two conventions look,
         | but they are real conventions used by real SQL people. And I
         | can understand why they exist.
         | 
         | I've seen them enough to not be bothered by them any more.
        
           | silveraxe93 wrote:
           | Yeah, unfortunately you're right that they are real
           | conventions. Quite common too.
           | 
           | I also _understand_ why they exist. It's simple: It makes
           | code marginally easier to write.
           | 
           | But writing confusing, unintuitive and honestly plain ugly
           | code. Just so you can save a second after clicking run and
           | the compiler tells you the mistake is a bad reason.
        
             | arp242 wrote:
             | A lot of "readability" depends on what you're used to and
             | what you expect. I don't think these conventions are
             | inherently "ugly" or "confusing", but they are different to
             | what I've been doing for a long time, and thus unexpected,
             | and thus "ugly". But that's extremely subjective.
             | 
             | I've done plenty of SQL, and I've regularly run in to the
             | "fuck about with fucking trailing commas until it's valid
             | syntax"-problem. It's a very reasonable convention to have.
             | 
             | What should really happen is that the SQL standard should
             | allow trailing commas:                 select          a,
             | b,       from t;
        
         | regexman1 wrote:
         | That's a totally valid point haha.
        
           | silveraxe93 wrote:
           | I'll try to give some constructive criticism instead of a
           | drive by pot shot. I'm sorry, it's just that the leading
           | commas make my eyes bleed and I really hope the industry
           | moves away from it.
           | 
           | On point 3: What I do is use CTEs to create intermediate
           | columns (with good names) and then a final one creating the
           | final column. It's way more readable.
           | 
           | ```sql
           | 
           | with intermediate as (
           | 
           | select                 DATEDIFF(DAY, timeslot_date,
           | CURRENT_DATE()) > 7 as days_7_difference,
           | DATEDIFF(DAY, timeslot_date, CURRENT_DATE()) >= 29 as
           | days_29_difference,            LAG(overnight_fta_share, 1)
           | OVER (PARTITION BY timeslot_date, timeslot_channel ORDER BY
           | timeslot_activity) as overnight_fta_share_1_lag,
           | LAG(overnight_fta_share, 2) OVER (PARTITION BY timeslot_date,
           | timeslot_channel ORDER BY timeslot_activity)as
           | overnight_fta_share_2_lag
           | 
           | from timeslot_data)
           | 
           | select                 iff(days_7_difference,
           | overnight_fta_share_1_lag, null) as C7_fta_share,
           | iff(days_29_difference, overnight_fta_share_2_lag, null) as
           | C28_fta_share
           | 
           | from intermediate ```
        
             | regexman1 wrote:
             | I appreciate the feedback, no offence taken. I'm an analyst
             | so I often find the leading comma useful when I'm testing
             | something and want to quickly comment a column out but I
             | take your point.
             | 
             | And I agree, I should have used CTEs for this query, I was
             | just trying to save lines of code which had the unintended
             | consequence of quite an ugly query. However I did want to
             | use it as an example of indentation being useful to make it
             | slightly easier to read. Although perhaps I'm the only one
             | who thinks so.
             | 
             | I greatly appreciate the constructive criticism.
        
         | swarnie wrote:
         | Alternatively, write a mess of SQL like a three year old child
         | that just discovered MSPaint then push the "beautifier" button
         | and knock off for an early lunch.
        
         | chipdart wrote:
         | > The first 2 are literally sacrificing readability so it's
         | easier to write, (...)
         | 
         | The leading comma format brings benefits beyond readability.
         | For example, in version control systems the single-argument-
         | per-line-with-leading-comma format turns any change to those
         | arguments as a one-line diff.
         | 
         | I think developers spend as much time looking at commit
         | historyas they do to the actual source code.
        
           | hinkley wrote:
           | If you're still using a diff tool that can't do sub-line
           | diffs it's time to join the 20's. I haven't been forced to
           | use one of those in over ten years.
        
             | chipdart wrote:
             | > If you're still using a diff tool that can't do sub-line
             | diffs it's time to join the 20's.
             | 
             | I think you failed to understand what I wrote.
             | 
             | Leading comma ensures one line diffs, but trailing comma
             | forces two-line diffs when you add a trailing argument.
             | With trailing comma, you need to touch the last line to add
             | a comma, and then add the new argument in the line below.
             | 
             | We are not discussing bundling all arguments in a single
             | line. I don't know where you got that idea from.
        
               | hinkley wrote:
               | What's the value in doing this unless it makes the diff
               | clearer?
               | 
               | It only makes the diff clearer if you don't have single
               | character highlighting in your diff tool. Which most have
               | now. Have had for a decade.
               | 
               | Also it's not going to be a single line anyway. You add a
               | line to the query and one to the caller. At a minimum. So
               | you're really arguing for three versus four. Which is
               | false economy.
        
               | chipdart wrote:
               | > What's the value in doing this unless it makes the diff
               | clearer?
               | 
               | Because it makes the diff clearer.
               | 
               | Are you even reading the posts you're replying to?
        
               | orbital223 wrote:
               | > Leading comma ensures one line diffs
               | 
               | It does not. It just moves the edge case to a different
               | position: trailing comma has the "issue" when adding an
               | argument to the end of the list while leading comma has
               | it when adding an argument to the beginning.
               | 
               | Also, as pointed out by the other commenter, any decent
               | modern diff tool will make it obvious that the change to
               | the existing line is just the addition of a comma, which
               | makes the difference basically moot.
        
         | hinkley wrote:
         | Who splits column per line in the SELECT block and still leave
         | 150 character wide lines? This is a fucked up definition of
         | legibility. I can't even get started on the commas.
         | 
         | NOBODY CHECKS LONG LINES IN CODE REVIEWS. That was the biggest
         | problem with AngularJS. People mishandling merges and breaking
         | everything because the eyes start to glaze over at column 90.
         | I've been on more than half a dozen teams with CRs and it's
         | always the same. I'm exquisitely aware of this and try not to
         | do it, and I still fuck it up half as often as the next person.
         | 
         | Split your shit up. Especially when trying to set an example
         | for others.
        
           | hyperman1 wrote:
           | This could be a great comment if the tone was different. I'll
           | try to give my perspective.
           | 
           | SQL, unfortunately, is very verbose and has a strange mix of
           | super-high and very low abstraction. There is also no SQL
           | formatter out there that does a decent job, and no real
           | consensus about how good SQL is supposed to look.
           | 
           | If I look at the 'indent' guideline, it contains e.g.:
           | , IFF(DATEDIFF(DAY, timeslot_date, CURRENT_DATE()) >= 29,
           | LAG(overnight_fta_share, 2) OVER (PARTITION BY timeslot_date,
           | timeslot_channel ORDER BY timeslot_activity),         NULL)
           | AS C28_fta_share
           | 
           | Immediate SQL failures: 1) it has no easy facility to pull
           | that DATEDIFF clause in a different variable/field. 2) The
           | LAG line is verbose, especially if your DB doesn't allow to
           | pull out the WINDOW clause.
        
         | ahoka wrote:
         | Why do you think its worse
         | 
         | ? I don't see any problems
         | 
         | , or anything wrong with it
         | 
         | .
        
       | Semaphor wrote:
       | Regarding "Comment your code!": At least for MSSQL, it's often
       | recommended not to use -- for comments but instead /**/, because
       | many features like the query store save queries without line
       | breaks, so if you get the query from there, you need to manually
       | fix everything instead of simply using your IDEs formatter.
        
         | regexman1 wrote:
         | I didn't realise that, great to know. Thanks!
        
         | petters wrote:
         | That sounds like a bug in the query store
        
       | AtNightWeCode wrote:
       | Never use WHERE 1=1. It is both a security risk and a performance
       | risk to run dynamic ad-hoc queries.
        
         | Gunax wrote:
         | What is a dynamic, adhoc query? Why does adding 1=1 support
         | that?
        
           | abrookewood wrote:
           | I'm wondering that as well. I don't get that suggestion at
           | all.
        
           | egormakarov wrote:
           | Lets say its 2001 and you are writing some hot e-commerce
           | stuff in plain php. You want to filter data depending on
           | multiple fields in the submitted form. If some field is
           | there, you add one more "AND" clause to the "WHERE", like
           | this: if (isset($_POST['product'])) { $query .= "AND product
           | = " . $_POST['product']; }. So in order not to check every
           | time if the added clause is the first one you start with
           | "WHERE 1=1 ", as "WHERE AND ..." would not work.
        
             | freilanzer wrote:
             | Php has nothing like this?
             | 
             | In [1]: "... WHERE " + " AND ".join(str(i) for i in
             | range(4))
             | 
             | Out[1]: '... WHERE 0 AND 1 AND 2 AND 3'
             | 
             | Very strange.
        
               | egormakarov wrote:
               | This will produce broken SQL on empty clauses list. Very
               | strange.
        
             | paperplatter wrote:
             | I get how this isn't good. But how else would you handle
             | multi-field filtering, keep all the ANDs and use
             | (product_id = $1 OR $1 IS NULL) so the unset filters are
             | no-op? That's ok as long as the query planner is smart
             | enough.
        
           | AtNightWeCode wrote:
           | In this case. A query that you build by adding different
           | strings. 1=1 is for adding AND statements to the WHERE clause
           | dynamically. In your code. I never seen it used for anything
           | else. Adhoc is just the practice of running raw SQL queries.
           | 
           | So you end up with things like this.
           | 
           | "SELECT * FROM Music WHERE 1=1" + "AND category='rock'"
           | 
           | The risk is now that you by mistake allow for SQL-injections
           | but also every genre will generate a different query plan.
           | Depending on what SQL engine you use this may hurt
           | performance.
           | 
           | And one would think that this is a thing of the past. But it
           | is not.
        
         | adamzochowski wrote:
         | Can you expand on this? How is having                   WHERE
         | 1=1         AND ...[usual-where-clause]...
         | 
         | A performance and security compared to doing
         | WHERE ...[usual-where-clause]...
        
           | AtNightWeCode wrote:
           | If you use it in the same way you use trailing commas. Fair.
           | But the site says to make it easier to add dynamic
           | conditions. Which is a terrible idea in maybe not all but
           | many SQL engines.
        
         | jcz_nz wrote:
         | Can you elaborate on security issues here?
        
           | thestepafter wrote:
           | I think that it means the reason for doing where 1 = 1 is
           | sometimes to allow for easy insertion of dynamic queries
           | which can be a security and performance issue. The actual
           | usage of where 1 = 1 doesn't cause the security or
           | performance issue.
        
             | AtNightWeCode wrote:
             | Which is exactly what the site says. To insert dynamic
             | conditions. I know that you can use 1=1 for the same
             | reasons as trailing commas. But kinda obvious that this is
             | not the case here.
        
         | regexman1 wrote:
         | I'll add this as a caveat. I'm an analyst so my SQL isn't
         | really exposed to anyone other than myself and so I wasn't
         | aware of this, thanks for flagging.
        
           | halayli wrote:
           | A random person claims adding 1=1 is a security risk and you
           | are going to add it as caveat without verifying if the claim
           | is true nor knowing why? That's how misinformation spreads
           | around.
           | 
           | OP doesn't know what they are talking about because adding
           | 1=1 is not a security risk. 1=1 is related to sql injections
           | where a malicious attacker injects 'OR 1=1' into the end of
           | the where clause to disable the where clause completely. OP
           | probably saw '1=1' and threw that into the comment.
        
             | regexman1 wrote:
             | Fair point!
        
             | AtNightWeCode wrote:
             | Read my other comments. I worked with SQL on and off since
             | the last century. It has nothing to do with your poor
             | assumptions.
        
               | the_gorilla wrote:
               | Duration of working with SQL doesn't matter. The better
               | SQL programmers don't do it specifically, and have
               | experience in real languages that they bring over to
               | database queries.
        
           | paperplatter wrote:
           | 1=1 is not a security risk
        
         | ricardo81 wrote:
         | Presumably you are thinking about queries in code that add
         | WHERE clauses dynamically that aren't escaped correctly- which
         | doesn't have to be the case.
         | 
         | 1 = 1 is at least handy for simply joining a variadic amount of
         | other clauses with ' AND ' rather than counting if there's any
         | to add at all.
        
           | AtNightWeCode wrote:
           | Yes. "Use a dummy value in the WHERE clause so you can
           | _dynamically_ add and remove conditions with ease: " I don't
           | know how to read this in another way.
        
         | DH61AG wrote:
         | This doesn't make any sense at all.
        
       | philippta wrote:
       | I really like the formatting presented in this article:
       | 
       | https://www.sqlstyle.guide/#spaces
        
       | elchief wrote:
       | use sqlfluff linter and do what it says
        
       | l5870uoo9y wrote:
       | And I take it CTEs are implicitly being discouraged.
        
         | regexman1 wrote:
         | Not at all actually, I just hadn't really planned to add this
         | as a tip. Additionally I thought an in-line view was fine for
         | the examples included. But maybe I will!
        
         | higeorge13 wrote:
         | It used to be like this (i remember in past postgres versions
         | CTEs had worse performance than subqueries), but not anymore.
        
       | wodenokoto wrote:
       | Everybody is up in arms about the comma suggestion but everyone
       | thinks the 1=1 is a good idea in the where clause? If I saw that
       | in a code review I don't know what I'd think of the author.
        
         | AtNightWeCode wrote:
         | You can motivate it with the same reasons as trailing commas.
         | Making code reviews easier since changes to WHERE statements
         | does not effect other lines. But if the reason is, as in this
         | case to be able to add dynamic conditions. You will for sure be
         | fired where I work.
        
       | dspillett wrote:
       | On readability, I often find aligning things in two columns is
       | more readable. To modify the two examples in TFA:
       | SELECT e.employee_id              , e.employee_name
       | , e.job              , e.salary           FROM employees e
       | WHERE 1=1 -- Dummy value.            AND e.job IN ('Clerk',
       | 'Manager')            AND e.dept_no != 5              ;
       | 
       | and with a JOIN:                   SELECT e.employee_id
       | , e.employee_name              , e.job              , e.salary
       | , d.name              , d.location           FROM employees e
       | JOIN departments d                ON d.dept_no = e.dept_no
       | WHERE 1=1 -- Dummy value.            AND e.job IN ('Clerk',
       | 'Manager')            AND e.dept_no != 5              ;
       | 
       | In the join example, for a simple ON clause like that I'll
       | usually just have JOIN ... ON in the one line, but if there are
       | multiple conditions they are usually clearer on separate lines
       | IMO.
       | 
       | In more complicated queries I might further indent the joins too,
       | like:                   SELECT *           FROM employees e
       | JOIN departments d                  ON d.dept_no = e.dept_no
       | WHERE 1=1 -- Dummy value.            AND e.job IN ('Clerk',
       | 'Manager')            AND e.dept_no != 5              ;
       | 
       | YMMV. Some people strongly agree with me here, others vehemently
       | hate the way I align such code...
       | 
       | WRT "Always specify which column belongs to which table": this is
       | particularly important for correlated sub-queries, because if you
       | put the wrong column name in and it happens to match a name in an
       | object in the outer query you have a potentially hard to find
       | error. Also, if the table in the inner query is updated to
       | include a column of the same name as the one you are filtering on
       | in the outer, the meaning of your sub-query suddenly changes
       | quite drastically without it having changed itself.
       | 
       | A few other things off the top of my head:
       | 
       | 1. Remember that as well as UNION [ALL], EXCEPT and INTERSECT
       | exist. I've seen (and even written myself) some horrendous SQL
       | that badly implements these behaviours. TFA covers EXCEPT, but I
       | find people who know about that don't always know about
       | INTERSECT. It is rarely useful IME, but when it is useful it is
       | really useful.
       | 
       | 2. UPDATEs that change nothing still do everything else: create
       | entries in your transaction log (could be an issue if using log-
       | shipping for backups or read-only replicas etc.), fire triggers,
       | create history rows if using system-versioned tables, and so
       | forth. UPDATE a_table SET a_column = 'a value' WHERE a_column <>
       | 'a value' can be a lot faster than without the WHERE.
       | 
       | 3. Though of course be very careful with NULLable columns and/or
       | setting a value NULL with point 2. "WHERE a_column IS DISTINCT
       | FROM 'a value'" is much more maintainable if your DB supports
       | that syntax (added in MS SQL Server 2022 and Azure SQL DB a
       | little earlier, supported by Postgres years before, I don't know
       | about other DBs without checking) than the more verbose
       | alternatives.
       | 
       | 4. Trying to force the sort order of NULLs with something like
       | "ORDER BY ISNULL(a_column, 0)", or doing similar with GROUP BY,
       | can be very inefficient in some cases. If you expect few rows to
       | be returned and there are relatively few NULLs in the sort target
       | column it can be more performant to SELECT the non-NULL case and
       | the NULL case then UNION ALL the two and then sort. Though if you
       | do expect many rows this can backfire badly and you and up with
       | excess spooling to disk, so test, test, and test again, when
       | hacking around like this.
        
       | AtNightWeCode wrote:
       | A common mistake I see is that people think foreign keys will
       | automatically create indexes. Missing indexes is a general
       | problem in SQL. Missing indexes on columns that are in foreign
       | keys are even worse.
        
       | mergisi wrote:
       | Great post! If you're looking to speed up your SQL queries, you
       | might want to check out AI2sql https://ai2sql.io/. It can
       | generate SQL queries quickly from plain English prompts, which
       | can be super helpful when you're in a rush or dealing with
       | complex queries. Definitely worth giving a try for anyone looking
       | to streamline their workflow!
        
       | rawgabbit wrote:
       | My tips for working with complex Stored Procedures.
       | 
       | 1. At the beginning of the proc, immediately copy any permanent
       | tables into temporary tables and specify/limit/filter only for
       | the rows you need.
       | 
       | 2. In the middle of the proc, manipulate the temporary tables as
       | needed.
       | 
       | 3. At the end of the proc, update the permanent tables enclosed
       | within a transaction. Immediately rollback transaction/exit the
       | proc, if an error is detected. (By following all three steps,
       | this will improve concurrency and lets you restart the proc
       | without manually cleaning up any data messes).
       | 
       | 4. Use extreme caution when working with remote tables. Remote
       | tables do not reside in your RDBMS and most likely will not
       | utilize any statistics/indexes your RDBMS has. In many cases, it
       | is more performant to dump/copy the entire remote table into a
       | temporary table and then work with that. The most you can expect
       | from a remote table is to execute a Where clause. If you attempt
       | Joins or something complicated, it will likely timeout.
       | 
       | 5. The Query Plan is easily confused. In some cases, the Query
       | Plan will resort to perform row by row processing which will
       | bring performance to a halt. In many cases, it is better to break
       | up a complex stored procedure into smaller steps using temporary
       | tables.
       | 
       | 6. Always check the Query Plan to see what the RDBMS is actually
       | doing.
        
         | remus wrote:
         | 1-3 are nice if you can guarantee your data is reasonably
         | sized, but if it gets too big for your hardware taking copies
         | of large datasets and then doing updates on large datasets can
         | add a lot of overhead.
        
         | wvenable wrote:
         | I've significantly improved the performance of queries by
         | undoing someone who did #5 when it wasn't strictly needed.
         | Sometimes breaking a query into many smaller queries is
         | significantly less efficient than giving the query optimizer
         | the entire query and letting it find the best route to the
         | data.
         | 
         | If you've done #5 without doing #6 then you'll likely not see
         | that you're doing something not optimal. My advice is avoid
         | premature optimization and do things the most straight forward
         | way first and then only optimize if needed. Most importantly,
         | don't code in SQL procedurally -- you're describing the data
         | you want not giving the engine instructions on how to get it.
        
         | the_gorilla wrote:
         | I hate having to use a bunch of temp tables, but I regularly
         | run into queries that would never finish if you let the query
         | planner do its thing. Like compilers their ability is highly
         | overrated. Meanwhile, microsoft places constant warnings
         | against trying to even tune their query planner because it
         | supposedly knows best.
        
       | shmerl wrote:
       | I don't get the point of the dummy value. How does it help doing
       | anything? I can add conditions with ease without it.
        
       ___________________________________________________________________
       (page generated 2024-09-25 23:00 UTC)