hngopher.com

       [HN Gopher] Paginating Requests in APIs (2020)
       ___________________________________________________________________
        
       Paginating Requests in APIs (2020)
        
       Author : exemel_hn
       Score  : 181 points
       Date   : 2022-05-28 15:31 UTC (7 hours ago)
        
 (HTM) web link (ignaciochiazzo.medium.com)
 (TXT) w3m dump (ignaciochiazzo.medium.com)
        
       | a-dub wrote:
       | cursors result in expensive server side state. if ordering is
       | less important an alternative is to make cardinality queries
       | cheaper and then allow the client to request sections thereby
       | making the scheme stateless for the purposes of the server.
       | 
       | so one query returns an approximate cardinality of the result set
       | or a partition count based on approximate size of the result set
       | and a fixed request response size.
       | 
       | the second to n queries are gets, but the gets include the
       | partition count and the partition requested. the server uses a
       | hashing scheme to either reduce the input data used for the
       | construction of the response or filtering of the response itself.
       | 
       | use of numerical partition ids and hashing allows for the
       | response to maintain consistency over time as well as pushing of
       | the sharding/selection scheme as deep as necessary into the data
       | layer to avoid scalability issues.
       | 
       | requests that are too big are cancelled and given abuse points.
        
       | therusskiy wrote:
       | How do you find on the backend if a page is the last one? The
       | trick I used to avoid extra SQL queries is if a user asks for 10
       | records I load up to 11 and then return 10. If there were 11 then
       | I return a cursor for the next page, otherwise next page is null.
       | Is there a better way?
        
         | yashap wrote:
         | Just let people query for an empty page every now and then, no
         | big deal.
        
         | whenlambo wrote:
         | Moreover, you can query, like, +3 more, and if the extra is
         | less than 3, return that extra items (short tail) also. This
         | way, you won't have the last page with 1-2 items only.
        
           | omegalulw wrote:
           | How do you deal with cases where results must be ordered by
           | some field and the data changes between page 1 request and
           | page 2 request?
        
             | NonNefarious wrote:
             | You could select into a temporary table and return pages of
             | results from that.
             | 
             | You'd need some mechanism to delete the temporary table
             | after a time, of course, but I imagine this is not
             | uncommon.
             | 
             | Another method would be having a last-modified timestamp on
             | every row in the DB; then you could select records at or
             | before the initial query time. Seems like overkill just for
             | this one purpose, but I imagine it might be generally
             | useful information to have in the DB.
        
             | spockz wrote:
             | If the ordering changed because the data changed then just
             | show the new situation at that page. If the results' order
             | is important there are several things you can do (at
             | least):
             | 
             | 1. You could save the order under a token and iterate
             | respective to that saved order, only showing the data for
             | the ids that were originally included. 2. Instead of
             | continuing from a page count show the next amount. Any
             | mutation before that point will be invisible without having
             | to store the original result list (only ids).
        
         | noneeeed wrote:
         | That's what we do, I'm not sure there is another way other than
         | making a request for the next page.
        
         | drbojingle wrote:
         | You could ask for a count of records and divide by the page
         | size.
        
           | neoglow wrote:
           | Counts become incredibly slow on some databases like
           | Postgres. It needs to do a sequential scan, which results in
           | slow(er) performance on even a ten- thousands of rows. When
           | operating at scale, this is very, very slow.
        
             | kadoban wrote:
             | This seems unlikely to depend on which database software.
             | It could easily depend on what indexes exist for the query.
        
             | fabian2k wrote:
             | From what I understand it's not slow if it can use an
             | index, but the criteria for that are rather hard to
             | understand as a normal user. If you vacuum often enough and
             | the visibility map is current, it doesn't need a full table
             | scan.
        
           | spurgu wrote:
           | This is what I do, since counts should cause minimal overhead
           | (I've never had to deal with huge databases). But neither
           | does the N+1 records trick which OP describes, that's clever.
           | :)
        
             | SonOfLilit wrote:
             | Counts can cause significant overhead on even millions of
             | rows.
        
               | spurgu wrote:
               | Thanks, that's good to know (will have to perform some
               | tests at some point). I do have a few tables (namely
               | logs) that will have millions of rows eventually over
               | time. But there I've also been planning on rotating them
               | whenever that becomes a problem, so I'm not overly
               | concerned. Or experiment with OP's solution.
        
               | MereInterest wrote:
               | If I were to ask the number of even numbers greater than
               | zero and less than 10 billion, you could very quickly
               | give an exact answer. If I were to ask the exact number
               | of prime numbers on that same range, it would take much
               | longer to find the correct answer.
               | 
               | In the same way, asking for counts may or may not be a
               | quick answer, depending on both the table structure and
               | the query being performed. Query the number of customers
               | who started this year, on a table indexed by start date,
               | and it's incredibly cheap. If that same table is only
               | indexed by last name, then counts become more expensive.
               | I'd say that the advice to return the counts has an
               | implicit addendum to make sure that your database is
               | appropriately indexed such that your typical queries are
               | cheap to perform.
        
               | inferiorhuman wrote:
               | So I'm coming at this from a Postgres POV, IIRC MySQL can
               | take some shortcuts for COUNT(*). Constraining the query
               | will help but counting is expensive as a sequential table
               | scan is always required.
               | 
               | Meanwhile I can only think of one use case for pagination
               | with exact row counts - a user facing application. If
               | you're preallocating storage you can get by with an
               | estimate. If you're consuming an API and want something
               | from near the end you can flip the sort order.
        
               | nolevels wrote:
               | Not necessarily in postgres. If an index is available it
               | may use that, though it will still have to refer to the
               | visibility map to see if any row is available to the
               | current transaction (cheap check), and if it can't
               | determine from that, then and only then does it need to
               | also refer to the heap / actual table (slower check).
        
               | inferiorhuman wrote:
               | Is that a recent change? Back in 2016 the Citus guys
               | claimed that:                 There is no single
               | universal row count that the database could cache, so it
               | must       scan through all rows counting how many are
               | visible. Performance for an exact       count grows
               | linearly with table size.
               | 
               | https://www.citusdata.com/blog/2016/10/12/count-
               | performance/
        
               | nolevels wrote:
               | Apologies, I should've clarified. I just meant postgres
               | doesn't necessarily have to scan the actual table to get
               | the count in some cases if an index is available, not
               | that it caches the actual count.
        
               | inferiorhuman wrote:
               | So that's not the greatest quote but the article shows
               | postgres having to fall back on a sequential scan for a
               | naive COUNT(*). The author goes on to mention that as of
               | 9.2 Postgres can do an index scan (in the context of
               | SELECT DISTINCT):                 But now we come to a
               | quirk, SELECT COUNT(DISTINCT n) FROM items will not use
               | the index even though SELECT DISTINCT n does. As many
               | blog posts mention ("one       weird trick to make
               | postgres 50x faster!") you can guide the planner by
               | rewriting       count distinct as the count of a subquery
               | 
               | To me it seems like you can trick the query planner into
               | doing an index scan sometimes, but that it's a bit
               | brittle. Has this improved much since?
        
               | inferiorhuman wrote:
               | Sure, but that point you've really got to start thinking
               | about architecture.
               | 
               | Do you really need to paginate millions of rows _and_
               | count them? Can you distribute the database and
               | parallelize counting? Can you save counts somewhere else?
               | Can you use estimates?
        
           | [deleted]
        
         | [deleted]
        
       | SPBS wrote:
       | So cursor pagination is just keyset pagination, except instead of
       | using a column value directly you use an opaque string that gets
       | translated back to a column value. So you can modify how the
       | cursor gets translated back to a column value anytime without
       | breaking API compatibility.
       | 
       | That said I'm not sure if everyone agrees on that definition,
       | from what I know people do consider keyset pagination and cursor
       | pagination to basically mean the same thing.
        
         | richbell wrote:
         | > So cursor pagination is just keyset pagination, except
         | instead of using a column value directly you use an opaque
         | string that gets translated back to a column value.
         | 
         | HashIds is a popular solution if those columns are numerical or
         | can be represented numerically (e.g. timestamp).
         | 
         | https://hashids.org/
        
         | noneeeed wrote:
         | I thinks that's what the article is saying. I found the
         | descriptions didn't really explain the differences very well.
         | 
         | We use a cursor that combines the sort key and normally the ID
         | as a secondary sort and pointer value, otherwise you can't
         | paginate through data where the sort column has duplicates. We
         | don't really do much to obfuscate these, but we don't document
         | the structure and people who try to tweak the values won't get
         | far.
        
         | giaour wrote:
         | One advantage of opaque cursors is that they can include
         | information that's not part of your public API. For instance,
         | if you allow soft deletes and do not return deleted items in
         | the list, you can end up with a massive stretch of deleted
         | items separating two live results. You can leave a note in the
         | cursor indicating what record was last scanned, and that's not
         | possible in keyset pagination.
        
       | blenderdt wrote:
       | A REST API can't use page based pagination because it's not
       | stateless. It is also not cachable.
        
         | ok123456 wrote:
         | if the page, limit and offset are parameters it is.
        
           | blenderdt wrote:
           | If you query the first page with a limit of 10, then insert
           | 10 items, do you get the same items with the same parameters?
        
             | isbvhodnvemrwvn wrote:
             | Using that logic you can only have restful APIs if your
             | application is useless and data never changes (which is
             | fine for me, I hate debates about restfulness)
        
             | ok123456 wrote:
             | If you really want to cache the results use e-tags based on
             | a hash set on write, or just include that hash as part of
             | your request as well.
        
         | skrebbel wrote:
         | It's exactly as RESTful and cacheable and stateless as common
         | websites, eg the HN frontpage or a newspaper homepage. I think
         | your bar for when something is RESTful or not is too high to be
         | useful.
        
           | blenderdt wrote:
           | Edit: my bad, I was looking at the wrong 'more' button.
        
             | joemi wrote:
             | How is https://news.ycombinator.com/news?p=2 not
             | pagination?
        
         | croes wrote:
         | Why? The server doesn't need to know if I already requested any
         | previous page when I request page 9. And if the data doesn't
         | change it can easily be cached.
        
         | hinkley wrote:
         | "Rest in Practice" (if I'm remembering the right book) would
         | disagree strenuously with you.
         | 
         | Once you're doing REST instead of just "somewhat sane JSON
         | responses" then the response has self-descriptive elements,
         | including links to related resources. And importantly, URLs for
         | older resources become static very, very quickly. Not unlike
         | the internals of subversion or git.
         | 
         | The trick there is to realize that O(1) = O(2). If you're
         | trying to show 25 items per page, you _do not need a query that
         | returns the most recent 25 items_. That creates a situation
         | where every single interaction is distinct from every other
         | interaction.
         | 
         | Asking for 25 results and everything newer than that is barely
         | more complicated than pagination already is.
        
       | branko_d wrote:
       | I don't like the term "cursor-based" pagination. Database cursors
       | are different from this, have been around for decades, and can
       | actually be used to implement pagination (albeit with requiring
       | the transaction to be kept alive between pages which can be...
       | problematic).
       | 
       | Perhaps, "token-based" pagination would be a better term?
        
         | abraxas wrote:
         | This is highly aggravating and confusing when hipsters try to
         | coopt terms that had a defined meaning for decades. The ones
         | that grate me are "retina display", "real time", "x-nanometer
         | node".
        
         | [deleted]
        
         | blurker wrote:
         | I agree. I was unsure if the author meant db cursors which
         | would be keeping transactions alive. That has obvious scaling
         | issues. But reading the comments here has me convinced they are
         | talking about application generated cursors. I think this is a
         | pretty reasonable confusion and it would benefit from added
         | clarity, like naming them differently.
        
         | kortex wrote:
         | > Perhaps, "token-based" pagination would be a better term?
         | 
         | Let's do it. When I was first learning about it, I wondered how
         | the concept related to database cursors, if at all.
         | 
         | It's really just a pointer to the next item at its most simple.
         | But it really is just any opaque blob. Maybe you want it to be
         | a jwt/encrypted blob with a request counter so you can track
         | request usage by specific accounts (along with the pointer). jk
         | that's probably a terrible idea (too entangled). Idk, just came
         | up with it. Point is, you _could_ do something like that, it 's
         | just a blob opaque to the client.
         | 
         | So I like "token-based pagination".
        
         | [deleted]
        
         | drdec wrote:
         | Thank you, now I don't need to write this comment
        
       | dwaite wrote:
       | I tried an experiment a while back to handle this using custom
       | units for the "Range" HTTP Header.
       | 
       | The biggest drawback is that the expectation is still that data
       | would be valid concatenated, so (for example) multiple results
       | would be expressed as multiple JSON values concatenated together,
       | rather than entries in a single array.
        
       | aaaaaaaaaaab wrote:
        
       | theteapot wrote:
       | If I take keyset approach then obfuscate my sorting key+value in
       | some token it becomes a "cursor"? The only point in doing this is
       | to stop abuse of large offsets?
       | 
       | Also article states a con of Keyset is you can't use an offsets.
       | Technically you can, just the same a pagin (WHERE ID > X ORDER BY
       | ID LIMIT Y OFFSET Z)??
        
       | sa46 wrote:
       | The API improvement proposals (AIP, yes it's a terrible acronym)
       | are a rich source of API design wisdom. It assumes protobufs but
       | the majority of proposals would work for any API.
       | 
       | The relevant one is https://google.aip.dev/158, which recommends
       | cursor-based pagination and covers different pagination use-
       | cases, including:
       | 
       | - support for different sort orders - mentioned in article as
       | weakness of keyset pagination
       | 
       | - support for skipping results - mentioned as a weakness of
       | cursor-based pagination
       | 
       | - why to use opaque page tokens
        
         | grogers wrote:
         | This is a good reference, but IMO page tokens need to expire,
         | even if you aren't storing any data for them. Expiring tokens
         | make it possible to change pagination formats over time (you
         | only need to wait the expiry time while supporting both
         | formats), eg to improve efficiency.
         | 
         | Also I'll add that if you support a pagination API that accepts
         | args (e.g. to filter on different dimensions) you should
         | include a hash of the args in the opaque next page token, and
         | fail the call if the args change. This prevents some nonsense
         | scenarios where you started paging by scanning index A but the
         | user switched filters such that you'd instead scan index B, but
         | you don't have the key for that column.
        
       | fabian2k wrote:
       | Are these actually server-side cursors or keyset pagination? For
       | keyset pagination you need essentially the last value of the
       | columns you used to sort by, but this seems to have some random
       | or encoded cursor value. I didn't think server-side cursors would
       | be used at that scale so this has to be keyset pagination, so I
       | think I'm missing something.
       | 
       | And while I really like keyset pagination, it is annoying to
       | implement in some cases. If your sorting column is not unique you
       | need to sort by multiple columns. And that is really annoying and
       | less efficient if you don't have row value comparisons. And not
       | all databases and especially not all ORMs support those. I'm
       | still waiting on EF Core support here, it looks a bit like this
       | might happen but it's not certain yet.
        
         | xdfgh1112 wrote:
         | Surely a keyset, even if it's multi-key, is more performant
         | than an offset query. Offset queries usually require the
         | previous rows to be computed first, so large offsets are super
         | slow.
        
           | Jweb_Guru wrote:
           | It is, but it's not a cursor in the traditional sense.
        
         | ko27 wrote:
         | EF core now supports it with PostgreSQL
         | 
         | https://github.com/npgsql/efcore.pg/pull/2350
        
         | oconnor663 wrote:
         | By server-side cursors do you mean like open connections to
         | MySQL reading a stream of query results, or something like
         | that? I don't think that's feasible for most websites. The
         | biggest issue is that you'd have to guarantee that when someone
         | clicks "next page", their browser ends up talking to the exact
         | same machine that it got the previous page from. But websites
         | running on big fleets of machines don't usually work like that;
         | instead your queries are usually somewhat randomly distributed
         | across a whole tier. You might work around that by trying to
         | keep a persisitent connection open from the browser to the
         | server, but you'd lose that connection when you refreshed the
         | page or when the server restarted, or if you lost WiFi for a
         | minute. Making hiccups like that visible to the user is pretty
         | frustrating. It's a lot more robust to use some sort of cursor
         | design that doesn't rely on any persistent connections.
         | 
         | Some other problems that crop up:
         | 
         | - A lot of websites expect to serve most results from cache,
         | and to only hit the DB a minority of the time, but to get your
         | hands on a real DB cursor you'd have to hit the DB every time.
         | 
         | - Not all DB's support moving cursors backwards. Maybe most of
         | them don't? I'm not sure.
         | 
         | - Open DB connections are usually a somewhat limited resource.
        
         | gruez wrote:
         | >but this seems to have some random or encoded cursor value
         | 
         | Maybe the actual key value is stored server side for
         | obfuscation/optimization reasons?
        
       | terandle wrote:
       | Wish this article would have gone into details about how cursor
       | based pagination is actually implemented. Like they did for the
       | other options by showing SQL. Is this something the database
       | engine has to provide support for?
        
         | andrewmackrodt wrote:
         | You can do something like the following in application code.
         | 
         | Imagine a table with the following schema/data:
         | id=1,created_at='2022-28-05T18:00Z'
         | id=2,created_at='2022-28-05T18:00Z'
         | id=3,created_at='2022-28-05T18:00Z'
         | 
         | To retrieve articles in order of descending creation time, our
         | sort order would be: created_at DESC, id DESC
         | 
         | The last item in the ORDER BY query should be a unique key to
         | ensure we have consistent ordering across multiple queries. We
         | must also ensure all supported sort columns have an INDEX for
         | performance reasons.
         | 
         | Assume our application returns one article at a time, we have
         | already retrieved the first article in the result set. Our
         | cursor must include information for that last records ORDER BY
         | values, e.g. serialize('2022-28-05T18:00Z,3'), for example
         | purposes I will use base64, so our cursor is
         | MjAyMi0yOC0wNVQxODowMFosMw==.
         | 
         | When the user requests the next set of results, they will
         | supply the cursor and it will be used to construct a WHERE
         | (AND) expression: created_at < '2022-28-05T18:00Z' OR
         | (created_at = '2022-28-05T18:00Z' AND id < 3)
         | 
         | So our query for the next set of results would be: SELECT *
         | FROM articles WHERE (created_at < '2022-28-05T18:00Z' OR
         | (created_at = '2022-28-05T18:00Z' AND id < 3) ORDER BY
         | created_at DESC, id DESC LIMIT 1
         | 
         | For queries with more than 2 sort columns the WHERE expression
         | will slightly increase in complexity but it need only be
         | implemented once. If the sort order direction is flipped, make
         | sure to adjust the comparator appropriately, e.g. for 'DESC'
         | use '<' and for 'ASC' use '>'.
        
           | delusional wrote:
           | Isn't that basically just a trivial extension of what they
           | article calls "keyset based pagination"?
           | 
           | I'm not complaining since this is also what I usually do, I'm
           | just wondering of theres more to it.
        
             | cortesoft wrote:
             | Oftentimes under the hood, cursor based paging IS keyset
             | paging. The difference is that the keyset is obfuscated,
             | which can allow the implementation to change without
             | changing the API call.
        
           | TravelPiglet wrote:
           | How do you handle removal?
        
             | andrewmackrodt wrote:
             | I haven't had a need to implement this but something like
             | the following should work to provide the same* records
             | regardless of insertion/deletion.
             | 
             | * data of the record may have been updated, but if there
             | are 1,000 records total, paginating in either direction
             | will always return those same 1,000 records regardless of
             | insertion/deletion.
             | 
             | - Filter by created_at <= time the endpoint first returned
             | a result.
             | 
             | - Utilize soft deletes, where deleted_at IS NULL OR
             | deleted_at < time the endpoint first returned a result.
             | 
             | Any keys allowed to be used in the ORDER BY query should
             | also be immutable.
        
           | catlifeonmars wrote:
           | Makes total sense. It also seems trivially automatable on the
           | surface. Echoing GP: Is this an existing DBMS feature? If
           | not, why not?
        
         | qsort wrote:
         | No need for anything special, you would just query with
         | something like "SELECT whatever FROM Table WHERE cursor>value
         | ORDER BY cursor LIMIT n". Obviously there needs to be an index
         | over the columns of the cursor, but any reasonable engine
         | should allow you to add that effortlessly.
        
           | mbStavola wrote:
           | The complication comes from having a cursor over results that
           | are sorted on multiple columns. It's not incredibly difficult
           | to implement, but it can certainly be annoying to do,
           | especially if you want to support before+after cursor as
           | well.
        
         | gabetax wrote:
         | Check https://use-the-index-luke.com/no-offset as a better
         | reference.
         | 
         | But in most SQL databases, cursors are something you implement
         | and parse at the application layer, and translate to a WHERE
         | clause on the (hopefully indexed) column you're ordering on.
         | That turns the O(N) "OFFSET" into a O(log(n)) index seek.
        
           | nafey wrote:
           | How is a cursor different from key set in that case?
        
             | cdash wrote:
             | It's not in that case, cursors can be implemented however
             | you want though. Its an abstraction, that prevents you from
             | having to change the schema of the API when you change the
             | implementation.
        
           | lazide wrote:
           | Many databases have actual cursor implementations which are
           | transactional. That means you'll get a consistent view at the
           | point in time you created the cursor.
           | 
           | That said, they tend to live only as long as the db
           | connection (with some exceptions), so yeah you need some
           | application work to make it sane.
        
         | sgtfrankieboy wrote:
         | Some database engines will support Cursors like this, but from
         | what I can gather Cursor based navigation is a wrapper around
         | KeySet or page navigation, but base64 encoded parameters.
         | 
         | Making the implementation details of pagination unimportant to
         | the API layer the user uses.
        
       | pragmatic wrote:
       | Continuation Token is a better term than cursor at it avoids
       | confusion with DB cursors.
       | 
       | https://phauer.com/2018/web-api-pagination-timestamp-id-cont...
        
       | rwieruch wrote:
       | Kinda related to this discussion about pagination APIs:
       | 
       | One year ago I had to design an API [0] for paginated nodes which
       | also allows nested nodes. Overall there would be more than
       | 100.000 nodes in the database.
       | 
       | On the UI side there would be a tree table to display the nodes.
       | First you would fetch page 0 with 100 nodes and from there you
       | could either perform a paginated fetch for the next page (page 1)
       | or a nested fetch for page 0 for one of the initially fetched
       | nodes. So essentially the API allows paginated and nested fetches
       | where each nested fetch would be a paginated fetch too.
       | 
       | We used cursor based pagination, because there would be added and
       | deleted nodes eventually in a collaborative environment. It was
       | definitely a fun challenge!
       | 
       | [0] https://www.robinwieruch.de/react-tree-list/
        
       | zikohh wrote:
       | "Cursor-based pagination is the most efficient method of paging
       | and should always be used when possible." Facebook API Docs
       | 
       | Why is it more efficient than key set pagination? According to
       | this article
       | https://archive.ph/2021.04.30-125536/https://medium.com/swlh...
       | the only difference is that the value of the key is encoded to
       | allow the implementation to change
        
         | wespiser_2018 wrote:
         | Both approaches use the underlying index in the database, but
         | cursor approaches need to look at far fewer results in order to
         | return the page. If you are deep into the result, your offset
         | could be high, like 1000 or so. The database would have to
         | start at the beginning and go through 1000 results, especially
         | if filters are in play, in order to get the the N items in the
         | page.
        
       | nesarkvechnep wrote:
       | Can't we do both offset and keyset-based pagination? If a user
       | wants a specific page, let them specify it, return them links to
       | the next and previous pages, using filter + limit.
        
       | mmastrac wrote:
       | You should never design an API that uses offsets for pagination
       | unless you're dealing with small amounts of data (<10000 rows).
       | Cursors give you far more flexibility and if you want to be lazy,
       | you can just hide an offset in an opaque cursor blob and upgrade
       | later on.
       | 
       | I don't think I've used offsets in APIs for at least 10 years.
       | Lightly-obfuscated cursor tokens are one of the first things I
       | build in web projects and that's usually less than an hour's
       | work.
       | 
       | If you _really_ need the ability to drop the needle in your
       | dataset with pagination, design your system to use pseudo-
       | pagination where you approximate page-to-record mappings and
       | generate cursors to continue forward or backward from that point.
        
         | xdfgh1112 wrote:
         | Iirc digitalocean API uses offsets. We had to write special
         | code to handle the possibility of the list changing while it
         | was being queried.
        
           | JamesSwift wrote:
           | Do any of these avoid that scenario? Unless you are
           | paginating on `updated_at ASC` there is an edge case of
           | missing new data on previously requested pages.
        
             | singron wrote:
             | Missing data that began existing after you started querying
             | is usually OK. If you had requested the data 1 second
             | earlier, it wouldn't have been returned anyway. With offset
             | pagination, you can miss data that always existed. This can
             | make you believe the item that previously existed had been
             | deleted.
             | 
             | Be very careful with timestamp or auto-increment id
             | pagination too. These don't necessarily become visible in
             | the same order since the id or timestamp is generated
             | before the transaction commits unless your database has
             | some specific way of ensuring otherwise.
        
               | xdfgh1112 wrote:
               | Exactly.
        
           | giaour wrote:
           | When I worked for a news site, we moved all listing endpoints
           | from offset to timestamp based pagination because offset
           | based pagination assumes the data being queried will remain
           | stable between requests. This was, of course, very rarely
           | true during business hours.
           | 
           | Duplicates or missed entries in the result set are the most
           | likely outcome of offset-based pagination.
        
           | gopher_space wrote:
           | Something always feels off about repopulating lists that
           | people are working on. Like that concept needs an entirely
           | different display method.
        
             | lazide wrote:
             | It's what happens with stateless protocols and offset based
             | pagination 100% of the time, but most people don't notice
             | it.
        
         | MAGZine wrote:
         | I consider offset-based paged pagination to be a mark of the
         | novice.
        
           | steve_adams_86 wrote:
           | edit: I just saw another comment of yours, so I see this was
           | meant more contextually than I realized.
           | 
           | Can you explain why offsets would never be a suitable
           | solution? Is there a clear explanation as to why?
           | 
           | I understand how cursors are superior in some cases. But what
           | if you have a site which paginates static data? Offsets would
           | allow you to cache the results easily, overcoming any
           | performance concerns (which would be irrelevant if the
           | dataset was small anyway), providing a better user experience
           | (and developer experience due to simpler implementation
           | details) overall.
           | 
           | I can see that it would be a novice move if you've got
           | staggering amounts of records that take a while to query. But
           | that's actually pretty rare in my experience.
        
             | MAGZine wrote:
             | It doesn't even have to be a cursor, you can technically
             | page off of any field you can index (i've written a lot of
             | sync APIs so there are gotchas, but the point stands).
             | 
             | Limit/offset is usually (though not always, as you point
             | out) egregious for anything more than hobbies or small
             | projects. But, I am biased as when I build APIs, I
             | definitely expect some amount of traffic/volume/tuple
             | count, and offset/limit will not do.
        
           | mcphage wrote:
           | Or the mark of someone who asked the users what they wanted.
        
             | MAGZine wrote:
             | I don't disagree, but everyone would prefer to have an API
             | that worked over one that doesn't, or one that causes
             | service outages due to database latency. (did anyone ask
             | the users if they wanted the api to work?)
             | 
             | If you're dealing with very small datasets, its fine. I'm
             | an average person using average APIs, which means that when
             | I see offset-based pagination, it's usually on a service
             | deployed and used by a lot of people.
             | 
             | Unsurprisingly, the offset based APIs often include some
             | other arbitrary limit like "offset limited to 10k" or
             | something silly but understandable if you've built an API
             | used by thousands of people before, or understand how
             | databases work.
             | 
             | They're also often superseded by betters APIs that actually
             | allow you to page the entire result set. Then you have a
             | deprecated API that you either support forever or annoy
             | users by turning it off.
             | 
             | So yes, if you are building something non-internal/pet
             | project, limit/offset is probably the mark of the novice.
        
         | xthrowawayxx wrote:
         | how do you jump to arbitrary pages with cursor pagination?
        
           | singron wrote:
           | You don't, but if your cursor is just a position in some
           | well-defined order, you can sometimes craft a cursor out of a
           | known position. Continuing the book metaphor, instead of
           | skipping to page 100, which has no semantic meaning, you
           | could skip to the beginning of the "M" section of the
           | dictionary, or see the first 10 words after "merchant".
        
           | lazide wrote:
           | Depending on what you mean by 'page', you probably want a
           | query? (Show me all results with values between 1 and 10)
           | 
           | If you want to jump ahead in the results, that is
           | fundamentally unstable though.
        
           | tshaddox wrote:
           | You generally don't, unless you want to linearly scan through
           | n pages. If your API is using offset or page numbers and the
           | underlying collection is also having items added or removed,
           | the behavior is undefined or straight up wrong. I think it's
           | okay to use offsets or page numbers in cases where the
           | collection is static or where it's acceptable to occasionally
           | skip over an item or see a duplicate item while paginating.
        
         | rhacker wrote:
         | Or... Search not page. which is typically way more useful in
         | reality.
        
       | orf wrote:
       | Cursor based pagination makes sense. At a certain scale every
       | part of your API becomes something you need to support, and that
       | includes pagination. This makes it super difficult to change the
       | storage you use, as suddenly you need to support integer based
       | offsets rather than opaque tokens.
        
         | duskwuff wrote:
         | Cursors also mean you get more predictable behavior when
         | interacting with data that's changing under you. Imagine
         | scrolling back through an active chat, for example -- you
         | wouldn't expect each page to contain some of the same chat
         | messages that are already on screen.
        
       | pornel wrote:
       | Another downside of offset-based pagination is less consistent
       | handling of data changes during pagination.
       | 
       | If any row is inserted anywhere before the page you requested,
       | you'll see some other item twice (once at the end of the previous
       | page, then again at the top of the next page). Similarly if any
       | item was removed, you'll skip over some _other_ item when
       | paginating due to every page shifting by one.
        
         | catlifeonmars wrote:
         | I'd take it further: offset based pagination failure modes look
         | different based on the implementation. If you have API clients,
         | you're exposing that implementation detail to them.
        
         | cortesoft wrote:
         | That was mentioned as a con for offset based pagination.
        
       | verst wrote:
       | Related: Do you think SDKs for these APIs should themselves
       | require you to interact with the cursors?
       | 
       | If I ask a SDK to get me 100 items but internally it returns only
       | 50 before needing to make another call with a marker or cursor It
       | would be nice if the SDK had an option to handle this for me
       | (possibly exposing cursors still for those who really want it).
        
         | jahewson wrote:
         | What if it fails after 50?
        
           | verst wrote:
           | Maybe return the items it found so far and provide the cursor
           | it failed on? Have a way to optionally provide the cursor? I
           | can think of some ways I'd implement this in an SDK. Probably
           | for simplicity I'd have some sort of JSON status field
           | indicating the request (traversing all cursors) completed. If
           | it's not complete I still would look for the next cursor it
           | returned and do the usual thing manually.
        
         | progval wrote:
         | SDKs should typically return a lazy iterator in languages which
         | support it, so caller do not even care about pagination and
         | just iterate on it until they don't want any more data; while
         | the SDK fetches new pages when needed.
         | 
         | In pseudo-Python:                   def get_stuff():
         | url = "https://example.org/api.php"             while url:
         | response = requests.get(url)                 yield from
         | response.json()                 url =
         | parse_link(response.headers["link"]).next
         | 
         | and you call it with:                   for item in
         | get_stuff():             print(item)
        
           | [deleted]
        
         | scarface74 wrote:
         | The AWS Python API users "paginators" to allow you to iterate
         | over results with a single for loop and it handles the
         | pagination in the background. The underlying API - which is
         | also exposes - uses and opaque "NextToken"
        
       | nthypes wrote:
       | Cursor based pagination is a good idea for APIs because it allows
       | for more flexibility in how data is displayed to the user. It
       | also helps to prevent large amounts of data from being returned
       | all at once, which can cause performance issues.
        
         | robertlagrant wrote:
         | All pagination can help prevent large amounts of data being
         | returned all at once.
        
         | skeeter2020 wrote:
         | > It also helps to prevent large amounts of data from being
         | returned all at once, which can cause performance issues.
         | 
         | Isn't that the point of all pagination, regardless of how it's
         | implemented?
        
         | cortesoft wrote:
         | > It also helps to prevent large amounts of data from being
         | returned all at once, which can cause performance issues
         | 
         | You can also limit page size and accomplish the same thing.
        
       | gregw2 wrote:
       | For large datasets, consider avoiding pagination complexity by
       | having your api return a (asynch) job ID, and dumping the whole
       | resultset to a file which can be fetched (over https) via the
       | jobID.
       | 
       | I have done this with both Postgres/S3 and Redshift/S3 backends
       | and presigned URLs and looks like I could do it with Snowflake
       | too.
        
       | wespiser_2018 wrote:
       | Nice write up!
       | 
       | I've really struggled with Haskell's lack of support of
       | pagination, and worked on at least two "inner-source" libraries
       | that offer what the author is calling "Key Set" or limit/offset
       | and Cursor Based as Servant combinators.
       | 
       | What I've found, is that "Key Set" is much more ergonomic for
       | library authors to implement, if they have to write the
       | database/sql calls themselves, although "Cursor Based" is
       | technically more efficient.
       | 
       | It's my opinion that the standard, out of the box behavior for an
       | endpoint should be to paginate every single endpoint that returns
       | a list of results, but in order to do that easily you'll need
       | some framework support. More mature frameworks have this right
       | now, and IMO this is one of the weak points in using a less
       | supported language like Haskell. You'll either have to roll your
       | own solution, or rely on spotty library support that does that
       | pagination operation in CPU.
        
       | Cthulhu_ wrote:
       | The limitation of cursor-based pagination - where you only know
       | the previous, current, and next page's cursor identifier - sounds
       | ideal for infinite scrolling applications, where the user does
       | not get the option to pick a page.
       | 
       | Another solution or optimization I've seen is in forum software,
       | where if a user executes a search, the IDs of the search results
       | are copied into a "search results" table with the equivalent of a
       | session ID; that way, the data remains stable between paging, and
       | the data set over which pagination has to be done (e.g. via
       | offset/limit) is very small; joins can be used to link to the
       | items in question in the search results (e.g. posts, threads), or
       | all the items that need to be shown in the paginated search
       | results can be copied over to the ephemeral search results table.
       | 
       | And of course, since it's ephemeral and memory is cheap these
       | days, the whole search result can be cached in server-side
       | memory.
        
       | buro9 wrote:
       | Moving forward and back in a cursor (pages) is easy... but how
       | about random pages?
       | 
       | i.e. this forum, HN. If there were ten pages of comments, going
       | from page 1 to page 2, or vice versa is trivial with a "after" or
       | "before" query. But what about jumping directly to page 3 without
       | previously visiting pages 2 or 4?
       | 
       | So far I've implemented a hybrid... pages still exist, but next
       | and previous page (the most common actions) are cursor.
       | 
       | Is there a better way?
        
         | jack_squat wrote:
         | IMO if a solution requires users to address data by page, it's
         | a sign that the search functionality is not good enough.
         | 
         | In general a user doesn't want to find "page 3", they are
         | looking for a record that has certain characteristics. They
         | shouldn't need to think about pages, just search terms.
        
         | kortex wrote:
         | If you use ULIDs for IDs (sortable by time, millisecond
         | resolution) and a tombstone field (nullable deleted_at is a
         | good one) then you have a very stable collection that
         | guarantees no new objects will be inserted/deleted before the
         | write head - it is append only in effect.
         | 
         | You can then do some cool things, especially if your objects
         | are evenly distributed and especially if you don't need exact
         | page sizes or can over-query and hide them as needed.
         | 
         | If you then know the approximate frequency of objects, you can
         | then map some linear scaling to an approximate range of ULIDs.
         | Basically German tank problem in reverse.
         | 
         | https://en.m.wikipedia.org/wiki/German_tank_problem
        
       | higeorge13 wrote:
       | I have seen a variation of the cursor based approach in Intercom
       | API [1] but with a fixed cursor (while the example shows a
       | different cursor on every request), which also does not allow you
       | to create another one within 1 minute. It feels almost like a
       | database server cursor, but anyone wants to guess how this is
       | implemented?
       | 
       | [1] https://developers.intercom.com/intercom-api-
       | reference/refer...
        
       | jmull wrote:
       | > Bad performance for large OFFSET in SQL. When doing OFFSET Nin
       | SQL, the database needs to scan and count N rows.
       | 
       | For user-facing apps, you've failed pretty hard already if users
       | can't find what they need on the first page or so.
       | 
       | If your offsets are big enough for this to matter, I'd rather
       | spend time on why users are so many pages in to the data than
       | optimizing the performance of later pages.
       | 
       | (Processing clients, on the other hand, should query for what
       | they need. Processing in batches may make sens, so there cursor-
       | or what they call keyset-based "pagination" makes good sense.
       | Though in the case of processing clients, I wouldn't call it
       | "pagination"... it's more like batches. I've mainly used "kelset-
       | based" pagination for processing events, which can alleviate some
       | of the "cons".)
        
         | giaour wrote:
         | Users do fuzzy searches across data all the time. They may
         | remember approximately when they last saw a record but not its
         | exact name, or their recollection may be close but imperfect.
        
       | z3t4 wrote:
       | Pagination in API's are a PITA as you have to make many requests
       | to get all the data. 99% of the time you want all data, not just
       | the first page. For the 1% use case you could use http-range or
       | page. Pagination is often implemented as an premature
       | optimization.
        
         | indymike wrote:
         | 8 minutes of wire time is bad and pagination is usually part of
         | server side frameworks...
        
         | russellendicott wrote:
         | In my experience the only reason you would want all the data is
         | if the API lacks good filter queries. IMO, making it a PITA to
         | download all data is kind of a feature.
        
         | cortesoft wrote:
         | > 99% of the time you want all data, not just the first page.
         | 
         | This certainly doesn't seem to be the case. Take viewing this
         | very website, for example... are you suggesting that most
         | people want to see all posts of all time, rather than just the
         | first page or two? Paging is used extensively for feeds, and
         | feeds are crazy common these days, and people rarely want to
         | see all posts ever on a feed.
        
         | crescentfresh wrote:
         | > Pagination is often implemented as an premature optimization.
         | 
         | or for forwards-compatibility.
        
       | stevebmark wrote:
       | I'd also point out that the GraphQL/Relay spec for pagination
       | enforces cursor based pagination:
       | https://relay.dev/graphql/connections.htm. It's also a really
       | nice pagination spec overall, edges/node/pageInfo are well
       | thought out and flexible.
       | 
       | In my professional experience, lots of devs will implement
       | fetching APIs and not consider pagination, and then when the
       | responses start growing (testing data, use, whatever), user
       | experience sucks. Pagination should be a default API design for
       | anything returning an unbounded list of items.
        
         | Ambolia wrote:
         | How do you deal in GraphQL when you have a query hitting with
         | multiple nodes which may return multiple cursors for the
         | different nodes? Seems like a nightmare if you have to check
         | and retry manually for each node, maybe there are libraries
         | that take care of it transparently?
        
           | stevebmark wrote:
           | How is this different than hitting multiple REST endpoints
           | for different resources with their own individual pagination?
           | If the client needs to glue together paginated resources from
           | two places, it's a hairy problem no matter what the protocol.
           | A backend for frontend that does this under the hood is one
           | solution to both, or a specialized endpoint that does it
           | under the hood.
        
             | Ambolia wrote:
             | In the case of hitting multiple REST endpoints for
             | different resources, on each query it's clear what
             | pagination I'm using and combining the data is just
             | appending it to the final list. In each case the partial
             | results are flat and linear.
             | 
             | But in the case of GraphQL, like I may get an incomplete
             | list, or I may get a complete list that will have some
             | fields with missing data, or some combination of both,
             | going down deep depending on the structure of the query. I
             | don't see a clear way to request the missing data without
             | duplicating part of what I already have, and I don't see a
             | clear way to combine the results.
             | 
             | But I may be missing something about GraphQL or something
             | about the tools that makes this simple.
             | 
             | For example something like this:                   {
             | users(first: 10000, after: "cursor123") {             edges
             | {                 cursor                 node {
             | id                     name
             | friends(first: 10000, after: "cursor235") {
             | edges {                             cursor
             | node {                                 id
             | name                             }
             | }                     }                 }             }
             | }         }
        
               | hamandcheese wrote:
               | In the relay JS library you can avoid some duplication
               | with fragments. Fragments are pieces of a graphql query
               | that you can include in multiple queries. If you are
               | using flow, the type system will also ensure that you
               | can't render a component unless you have used the correct
               | fragment in your query.
               | 
               | That said, paginating multiple resources on the same page
               | is still going to be complicated.
        
               | stevebmark wrote:
               | Oh. I think you'd handle it the same as rest, make a new
               | individual field query for that friend by ID and ask for
               | the pagination there. I think that would be manual work
               | for graphql or rest, im not aware of anything that does
               | that automatically. It's also not a case I've personally
               | ran into so not sure.
        
         | grahamm wrote:
         | Agreed, so many times I've raised paging with an internal API
         | provider only to be told they can retrofit it later if needed.
         | Then you start to use production data and bam the one call
         | doesn't work.
        
         | User23 wrote:
         | I disagree. I think pagination is a clumsy error prone kludge.
         | That it performs better than the awful naive approach by
         | additionally burdening the caller doesn't remotely make it
         | good. The correct approach is a buffered stream with proper
         | flow control. Incidentally that's what _Transmission Control
         | Protocol_ was invented for.
         | 
         | Probably the nicest way to expose this that fits contemporary
         | sensibilities would be as a lazy sequence that does sensible
         | prefetching and asks the server for more data as it's iterated
         | through.
        
           | stevebmark wrote:
           | When dealing with the most common case (I think) for
           | pagination, clients (web/native) requesting data, are you
           | suggesting keep a long running websocket only to fetch data
           | in response to new page requests? What benefit would that
           | afford, given that the requests are pretty far apart in time?
           | 
           | Or are you focusing more on the backend machine to machine
           | case? In that case, some kind of automatic prefetching, lazy
           | loading/yield setup sounds pretty nice if that's abstracted
           | away when you want to iterate over a large dataset from a
           | separate resource server. It's not a pattern I've used
           | before, mainly because it's not often that one server wants
           | to know _everything_ that another server has.
        
           | giaour wrote:
           | > a lazy sequence that does sensible prefetching and asks the
           | server for more data as it's iterated through
           | 
           | Isn't this an accurate description of cursor-based
           | pagination? You can have the server hold the cursor and
           | associate a request for the next page based on the client ID
           | (like in the PostgreSQL wire protocol), or you can make the
           | flow stateless (at the network level) by handing the client a
           | ticket for the next page.
        
             | jayd16 wrote:
             | Pagination implies many requests where as a streaming API
             | would keep to a single request, no? No worry about multiple
             | servers or juggling state.
        
               | giaour wrote:
               | It's still multiple requests on a streaming API if you're
               | asking the client to notify you when it's ready for the
               | next page. Most DB engines use TCP connection affinity to
               | identify which cursor is in play, but the client still
               | needs to send a packet asking for the next page. Both
               | client and server need to keep the TCP connection open,
               | which is state for both parties.
               | 
               | I see you mentioned gRPC streams in a sibling comment,
               | which are a great alternative to REST-based pagination,
               | but they are highly stateful! Streams are built into the
               | protocol, so a generic gRPC client will handle most of
               | the state juggling that would have been your
               | responsibility with a REST client.
        
               | jayd16 wrote:
               | This is a bit inaccurate I think. You can make a RESTful
               | api over gRPC. The protocol is irrelevant. An open
               | connection is not a violation of statelessness. If
               | anything, a streaming API where you can close the cursor
               | upon completion and treat new requests as fully new is a
               | lot less stateful.
               | 
               | We're also talking about APIs in general, not just http
               | REST.
               | 
               | >It's still multiple requests on a streaming API
               | 
               | Perhaps, but the requests can be pipelined. You don't
               | need to wait for the response to complete before asking
               | for more.
        
               | giaour wrote:
               | The original article is talking about pagination over
               | stateless (at the protocol level) HTTP requests.
               | (Referring to APIs that are offered over stateless HTTP
               | as "REST APIs" is technically incorrect but reflects
               | common usage and is a convenient shorthand.)
               | 
               | gRPC is able to offer powerful streaming abstractions
               | because it utilizes HTTP/2 streams, which are cursor
               | based rather than strictly connection oriented. The state
               | is still there; it's just a protocol-level abstraction
               | rather than an application-level abstraction.
               | 
               | > Perhaps, but the requests can be pipelined. You don't
               | need to wait for the response to complete before asking
               | for more.
               | 
               | That sort of defeats the purpose of grpc flow control,
               | doesn't it?
        
               | jayd16 wrote:
               | >That sort of defeats the purpose of grpc flow control,
               | doesn't it?
               | 
               | Why do you say that? I don't know the full implementation
               | details themselves but generally there's no reason you
               | can't safely ask for even more after having asked and
               | consumed some. If you have a buffer of M bytes and ask
               | for M, then consume N, you could immediately ask for N
               | more without waiting to receive all of the original M.
               | 
               | Although, I wasn't speaking about gRPC in that case
               | though. I'm not sure how exactly gRPC achieves back
               | pressure. I was only speaking abstractly about a
               | pipelined call vs many pages. You seemed to claim that
               | multiple requests was a requirement.
               | 
               | But fine, ignoring gRPC, you could possibly tune your
               | stack such that normal http calls achieve back pressure
               | from network stacks and packet loss. Http is built on top
               | of TCP streams, after all. That doesn't make it
               | inherently stateful does it?
               | 
               | Going all the way back, I still think its fair to say
               | that a streaming response with backpressure can be
               | stateless and without multiple requests. If you want to
               | argue that multiple packets or TCP signals are needed
               | then perhaps so, but I think that's a far cry from the
               | many separate requests a paginated call requires and I
               | dont think its accurate enough to conflate them.
        
               | giaour wrote:
               | > I don't know the full implementation details themselves
               | but generally there's no reason you can't safely ask for
               | even more after having asked and consumed some
               | 
               | I think we're saying the same thing but using different
               | formulations. If you send an HTTP request for a list with
               | 20 items, then get back a response with 10 items and a
               | link for the next page, that is essentially the same as
               | cosuming 20 items over a stream with flow control. The
               | point of returning a cursor in the response and having
               | the client send a new request for the next page is to
               | support a stateful stream over a stateless protocol. In
               | neither case are you waiting for the response to be
               | complete before processing items, since your message
               | indicates that "complete" here means the full result set
               | has been sent to the client.
               | 
               | > But fine, ignoring gRPC, you could possibly tune your
               | stack such that normal http calls achieve back pressure
               | from network stacks and packet loss. Http is built on top
               | of TCP streams, after all. That doesn't make it
               | inherently stateful does it?
               | 
               | That's pretty much how streaming responses are
               | implemented in TCP-based protocols (like the SQL query
               | APIs exposed by Postgres or MySQL). TCP connections can
               | be terminated for unrelated reasons, which is why you
               | don't see this pattern very often in recent protocols.
               | When a TCP connection is dropped and restarted due to,
               | say, network congestion, you have to restart the
               | paginated operation from the head of the list. H2 (and,
               | vicariously, gRPC) streams are built to be more resilient
               | to network noise.
               | 
               | But to answer your question, yes, that pattern is
               | inherently stateful. Pagination has to be, since the
               | server has to know what the client has seen in order to
               | prepare the next batch of results. You can manage this
               | state at the protocol level (with streams), or you can
               | push it to the application level (with cursor tokens
               | embedded in the messages exchanged). The streaming
               | approach requires a stateful server and client, whereas
               | the application-level approach only requires a stateful
               | client.
        
               | jayd16 wrote:
               | I think my opinion on close enough also differs from
               | yours (and that's ok!) but you're also forgetting that a
               | streamed/connection based approach only involves a single
               | client and server and state during the connection.
               | 
               | A paged response needs to sync state across many
               | machines. As you have said, you need a clientID, a cursor
               | reference, or a sticky session. That's not nothing.
               | 
               | I think they're inherently different approaches.
        
           | freedomben wrote:
           | I find pagination really annoying on the client, but the
           | majority of the time it does seem like the client doesn't
           | actually need the whole thing so it's a good (and often very
           | necessary) performance optimization. For endpoints where that
           | isn't the case, it's (usually) not too hard to allow a
           | sentinel value for page size that means infinity, so the
           | client really can get it all in one swoop.
        
           | jayd16 wrote:
           | >Probably the nicest way to expose this that fits
           | contemporary sensibilities would be as a lazy sequence that
           | does sensible prefetching and asks the server for more data
           | as it's iterated through.
           | 
           | gRPC's streaming API is a good example. It has backpressure
           | built in and you just need to configure the buffer sizes.
           | Clients just consume items off the stream and async writes
           | will suspend as needed.
        
           | robertlagrant wrote:
           | What are you saying that would be different in practice? What
           | advantages would it have?
        
             | jayd16 wrote:
             | Not sure I agree with the parent but thinking it through,
             | you have a single connection and a cleaner life cycle.
             | Streaming writes and reads could mean back-pressure is
             | possible. A client can consume items as needed as opposed
             | to mapping usage to a page size.
             | 
             | There's no worry about wasting a full request on the last
             | page with a single item. In fact there's no worry about
             | round trips at all, it's just a matter of buffer back
             | pressure that communicates the need for more or less.
             | 
             | Hmm, when you think about it, its a pretty good way to go.
        
               | User23 wrote:
               | That's consistent with what I have in mind. I too think
               | it's pretty much best of both worlds. You get better
               | efficiency than pagination with the convenience of an API
               | that's no different from fetching all the data.
        
               | nerdponx wrote:
               | I worked with an API like this once, which streamed
               | NDJSON one line at a time. I loved the amount of control
               | that it gave me on the client side!
        
       | amelius wrote:
       | > There is no way to skip pages. If the user wants page X, it
       | needs to request pages from 1 to X.
       | 
       | Huh? The client can ask the server to generate a new cursor, any
       | desired amount of items ahead in the list.
        
         | mypalmike wrote:
         | With opaque cursors, how does a client specify "a desired
         | amount"?
        
           | amelius wrote:
           | You just say "this cursor + 5 pages", for example.
        
             | hombre_fatal wrote:
             | You send `GET /items[?cursor=<cursor>]` to my server.
             | 
             | I respond:                   [             "items": [
             | {...}, {...}, {...}, ... ],             "next_cursor":
             | "77963b7a"         ]
             | 
             | How do you request "this cursor + 5 pages"?
        
               | amelius wrote:
               | GET /items?cursor=77963b7a&add_to_cursor=5pages
        
         | dingleberry420 wrote:
         | If the server supports this, which they won't, because they
         | don't want to support pagination.
        
       | travisgriggs wrote:
       | In 32 years of software development, it fascinates me how ever
       | much more time I spend just moving/translating data between
       | domains. I used to model things more, and do human interaction
       | more. Now it seems like a dominant task I do in any of the
       | products I work on is just shifting data from one
       | place/format/representation/node to the other.
        
         | itisit wrote:
         | Ah, yes...the GRIMM stack: General Repetition of Integrations,
         | Migrations, and Misery
        
         | GiorgioG wrote:
         | "Modern day best practices"...
        
         | RandyRanderson wrote:
         | Let's rewrite the microservices wiki as bit:
         | 
         | "A microdata architecture - a variant of the data-oriented
         | architecture structural style - arranges data as a collection
         | of loosely-coupled data. In a microdata architecture, data are
         | fine-grained and the relations are lightweight."
         | 
         | Likely most experienced data architects are now laughing at
         | what a dumb idea the above is. That's the same reaction I had
         | when I first heard of microservices - "that's obviously a bad
         | idea; ppl aren't that dumb", I thought.
        
           | travisgriggs wrote:
           | But big monoliths/spaghetti balls of code aren't great
           | either, you'd agree with that?
           | 
           | It seems like we don't have the ability to do "all things in
           | moderation" as an industry very well. We're in constant
           | search of The Silver Bullet, that if we just apply it,
           | will... buy the world a coke and live in perfect harmonies or
           | something. And we go through the cycle again and again and
           | again.
        
         | [deleted]
        
         | [deleted]
        
       | [deleted]
        
       | Lightbody wrote:
       | The Google Calendar API leaves a lot to be desired
       | (documentation, etc), but over time what I've found is that it's
       | really well-designed once you understand it.
       | 
       | In their case, they went with a "pageToken" variable which is
       | basically a cursor.
       | 
       | The API has been around for years.
        
       ___________________________________________________________________
       (page generated 2022-05-28 23:00 UTC)