[HN Gopher] Index, Count, Offset, Size
___________________________________________________________________
Index, Count, Offset, Size
Author : ingve
Score : 151 points
Date : 2026-02-18 08:20 UTC (3 days ago)
(HTM) web link (tigerbeetle.com)
(TXT) w3m dump (tigerbeetle.com)
| card_zero wrote:
| I can't read the starts of any lines, the entire page is offset
| about 100 pixels to the left. :) Best viewed in Lynx?
| Flow wrote:
| Looks perfect here. iOS Safari
| dataflow wrote:
| Is there any other example of "length" meaning "byte length", or
| is it just Rust just being confusing? I've never seen this
| elsewhere.
|
| Offset is ordinarily just a difference of two indices. In a
| _container_ I don 't recall seeing it implicitly refer to byte
| offset.
| SabrinaJewson wrote:
| In general in Rust, "length" refers to "count". If you view
| strings as being sequences of Unicode scalar values, then it
| might seem odd that `str::len` counts bytes, but if you view
| strings as being a subset of byte slices it makes perfect sense
| that it gives the number of UTF-8 code units (and it is
| analoguous to, say, how Javascript uses `.length` to return the
| number of UTF-16 code units). So I think it depends on
| perspective.
| dataflow wrote:
| That makes sense, I agree -- seems Rust is on board here too.
| AlotOfReading wrote:
| It's the usual convention for systems programming languages and
| has been for decades, e.g. strlen() and std::string.length().
| Byte length is also just more useful in many cases.
| dataflow wrote:
| No, those are counts by definition, and byte lengths only by
| coincidence. Look at wcslen() and std::wstring::length().
| wyldfire wrote:
| A length could refer to lots of different units - elements,
| pages, sectors, blocks, N-aligned bytes, kbytes, characters,
| etc.
|
| Always good to qualify your identifiers with units IMO (or
| types that reflect units).
| zahlman wrote:
| > or is it just Rust just being confusing?
|
| It doesn't mean "byte length", so much as "byte" happens to be
| the element type. Unicode is conventionally represented as
| UTF-8, so the container can't be directly indexed to yield a
| character.
| wahern wrote:
| Relatedly, a survey of array nomenclature was performed for the
| ISO C committee when choosing the name of the new countof
| operator: https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3469.htm
|
| It was originally proposed as lengthof, but the results of the
| public poll and the ambiguity convinced the committee to choose
| countof, instead.
| pansa2 wrote:
| The reason many languages prefer `length` to `count`, I think,
| is that the former is clearly a noun and the latter could be a
| verb. `length` feels like a simple property of a container
| whereas `count` could be an algorithm.
|
| `countof` removes the verb possibility - but that means that a
| preference for `countof` over `lengthof` isn't necessarily a
| preference for `count` over `length`.
| ncruces wrote:
| But count is more clearly a dimensionless number of elements,
| and not a size measured in some unit (e.g. bytes).
| layer8 wrote:
| I tend to use _numFoos_ (short for "number of foos"), and
| only use _fooCount_ when the variable is used for actual
| counting (like an _errorCount_ variable that is incremented
| for each error).
|
| _Countof_ is strange, because one doesn't talk about the
| "count of something" in English, other than uses like "on
| the count of three" (or the "count of Monte Cristo" ;)).
| zahlman wrote:
| When I see "countof" I expect an operation that lets me filter
| the container and tell me the count of things that meet some
| condition (probably described with a unary predicate, but
| perhaps just an element to check for equality).
| zephen wrote:
| The invariant of index < count, of course, only works when using
| Djikstra's half-open indexing standard, which seems to have a few
| very vocal detractors.
| GolDDranks wrote:
| Fortunately only a few. Djikstra's is obviously the most
| reasonable system.
| zephen wrote:
| Obviously to you and me, but you can see comments right here
| where others disagree.
|
| And the detractors certainly have momentum in certain
| segments on their side.
|
| Historically, of course, it was languages like Fortran and
| COBOL and even Smalltalk, but even today we have MATLAB, R,
| Lua, Mathematica, and julia.
|
| Big-endian won in network byte order, but lost the CPUs. One-
| based indexing won in mathematical computing so far, and lost
| main-stream languages so far, but the julia folks are trying
| to change that.
| tromp wrote:
| See
| https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831...
| for Dijkstra's thoughts on indexing.
| qouteall wrote:
| With modern IDE and AI there is no need to save letters in
| identifier (unless too long). It should be "sizeInBytes" instead
| of "size". It should be "byteOffset" "elementOffset" instead of
| "offset".
| throwaway2027 wrote:
| Isn't that more tokens though?
| Onavo wrote:
| Not significantly, it's one word.
| meindnoch wrote:
| Tokens are not words.
| 0x457 wrote:
| Sure you get one or two word extra worth of tokens, but you
| save a lot more compute and time figuring what exactly this
| offset is.
| post-it wrote:
| Only until they develop some kind of pre-AI minifier and
| sourcemap tool.
| pveierland wrote:
| When correctness is important I much prefer having strong types
| for most primitives, such that the name is focused on
| describing semantics of the use, and the type on how it is
| represented: struct FileNode {
| parent: NodeIndex<FileNode>, content_header_offset:
| ByteOffset, file_size: ByteCount, }
|
| Where `parent` can then only be used to index a container of
| `FileNode` values via the `std::ops::Index` trait.
|
| Strong typing of primitives also help prevent bugs like mixing
| up parameter ordering etc.
| kqr wrote:
| I agree. Including the unit in the name is a form of
| Hungarian notation; useful when the language doesn't support
| defining custom types, but looks a little silly otherwise.
| canucker2016 wrote:
| Depends on what variant of Hungarian you're talking about.
|
| There's Systems Hungarian as used in the Windows header
| files or Apps Hungarian as used in the Apps division at
| Microsoft. For Apps Hungarian, see the following URL for a
| reference - https://idleloop.com/hungarian/
|
| For Apps Hungarian, the variable incorporates the type as
| well as the intent of the variable - in the Apps Hungarian
| link from above, these are called qualifiers.
|
| so for the grandparent example, rewritten in C, would be
| something like: struct FileNode {
| FileNode *pfnParent; DWORD ibHdrContent;
| DWORD cb; }
|
| For Apps Hungarian, one would know that the ibHdrContent
| and cb fields are the same type 'b'. ib represents an
| index/offset in bytes - HdrContent is just descriptive,
| while cb is a count of bytes. The pfnParent field is a
| pointer to a fn-type with name Parent.
|
| One wouldn't mix an ib with a pfn since the base types
| don't match (b != fn). But you could mix ibHdrContent and
| cb since the base types match and presumably in this small
| struct, they refer to index/offset and count for the
| FileNode. You'd have only one cb for the FileNode but
| possibly one or more ibXXXX-related fields if you needed to
| keep track of that many indices/offsets.
| groundzeros2015 wrote:
| Long names become burdensome to read when they are used
| frequently in the same context
| ivanjermakov wrote:
| When the same name is used a thousand times in a codebase,
| shorter names start to make sense. See aviation manuals or
| business documentation, how abbreviation-dense they are.
| layer8 wrote:
| When you're juggling inputBufferSizeInBytes,
| outputBufferSizeInBytes,
| intermediateRepresentationBufferSizeInBytes, it becomes
| unwieldy and cumbersome.
|
| I once had a coworker like that, whose identifiers often
| stretched into the 30-50 characters range.You really don't want
| that.
| akdor1154 wrote:
| The 'same length for complementary names' thing is great.
| JSR_FDED wrote:
| Using the same length of related variable names is definitely a
| good thing.
|
| Just lining things up neatly helps spot bugs.
|
| It's the one thing I don't like about strict formatters, I can no
| longer use spaces to line things up.
| craig552uk wrote:
| I've never yet seen a linter option for assignment alignment,
| but would definitely use it if it were available
| ivanjermakov wrote:
| AlignConsecutiveAssignments in clang-format might be the
| right fit.
|
| https://clang.llvm.org/docs/ClangFormatStyleOptions.html
| skydhash wrote:
| I know prettier can isolate a code section from changes by
| adding comments. And I think others can too.
| userbinator wrote:
| Or learn an array language and never worry about indexing or
| naming ;-)
|
| Everything else looks disgustingly verbose once you get used to
| them.
| Fraterkes wrote:
| Is there any reason to not just switch to 1-based indexing if we
| could? Seems like 0-based indexing really exacerbates off-by-one
| errors without much benefit
| tialaramex wrote:
| I would bet that in the opposite circumstance you'd say the
| same thing:
|
| "Is there any reason to not just switch to 0-based indexing if
| we could? Seems like 1-based indexing really exacerbates off-
| by-one errors without much benefit"
|
| The problem is that humans make off-by-one errors and not that
| we're using the wrong indexing system.
| Fraterkes wrote:
| No indexing system is perfect, but one can be better than
| another. Being able to do array[array.length()] to get the
| last item is more concise and less error prone than having to
| add -1 every time.
|
| Programming languages are filled with tiny design choices
| that don't completely prevent mistakes (that would be
| impossible) but do make them less likely.
| adrian_b wrote:
| Having to use something like array[length] to get the last
| element demonstrates a defect of that programming language.
|
| There are better programming languages, where you do not
| need to do what you say.
|
| Some languages, like Ada, have special array attributes for
| accessing the first and the last elements.
|
| Other languages, like Icon, allow the use of both non-
| negative indices and of negative indices, where non-
| negative indices access the array from its first element
| towards its last element, while negative indices access the
| array from its last element towards its first element.
|
| I consider that your solution, i.e. using array[length]
| instead of array[length-1], is much worse. While it scores
| a point for simplifying this particular expression, it
| loses points by making other expressions more complex.
|
| There are a lot of better programming languages than the
| few that due to historical accidents happen to be popular
| today.
|
| It is sad that the designers of most of the languages that
| attempt today to replace C and C++ have not done due
| diligence by studying the history of programming languages
| before designing a new programming language. Had they done
| that, they could have avoided repeating the same mistakes
| of the languages with which they want to compete.
| GoblinSlayer wrote:
| If your design works better in one scenario usually means
| it works worse in other scenarios, you just shuffled
| garbage around.
| tialaramex wrote:
| array[array.length()] is nonsense if the array is empty.
|
| You should prefer a language, like Rust, in which [T]::last
| is Option<&T> -- that is, we can ask for a reference to the
| last item, but there might not be one and so we're
| encouraged to do something about that.
|
| IMNSHO The pit of success you're looking for is best dug
| with such features and not via fiddling with the index
| scheme.
| bruce343434 wrote:
| You say "seems like", can you argue/show/prove this?
| Fraterkes wrote:
| I think that many obo errors are caused by common situations
| where people can mistakenly mix up index and count. You could
| eliminate a (small) set of those situations with 1-based
| indexing: accessing items from the ends of arrays/lists.
| meindnoch wrote:
| And in turn you'd introduce off by one errors when people
| confuse the new 1-based indexes with offsets (which are
| inherently 0-based).
|
| So yeah, no. People smarter than you have thought about
| this before.
| SkiFire13 wrote:
| I'm not sure what that has to do with the article, but anyway:
| https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831...
|
| That said, I'm not sure how 1-based indexing will solve off-
| by-1 errors. They naturally come from the fencepost problem,
| i.e. the fact that sometimes we use indexes to indicate
| elements and sometimes to indicate boundaries between them.
| Mixing between them in our reasoning ultimately results in off-
| by-1 issues.
| Fraterkes wrote:
| This is an article that (among other things) talks about off-
| by-one errors being caused by mixing up index and count (and
| having to remember to subtract 1 when converting between the
| two). That's what it has to with it.
| adrian_b wrote:
| If you always use half-open intervals, you never have to
| subtract 1 from anything.
|
| With half-open intervals, the count of elements is the
| difference between the interval bounds, adjacent intervals
| share 1 bound and merging 2 adjacent intervals preserves
| the extreme bounds.
|
| Any programming problem is simplified when 0-based indexing
| together with half-open intervals are always used, without
| exceptions.
|
| The fact that most programmers have been taught when young
| to use 1-based ordinal numbers and closed intervals is a
| mental handicap, but normally it is easy to get rid of
| this, like also getting rid of the mental handicap of
| having learned to use decimal numbers, when there is no
| reason to ever use them instead of binary numbers.
| SkiFire13 wrote:
| I must have missed that part, my bad
| adrian_b wrote:
| This is a matter of opinion.
|
| My opinion is that 1-based indexing really exacerbates off-by-
| one errors, besides requiring a more complex implementation in
| compilers, which is more bug-prone (with 1-based addressing,
| the compilers must create and use, in a manner transparent for
| the programmer, pointers that do not point to the intended
| object but towards an invalid location before the object, which
| must never be accessed through the pointer; this is why using
| 1-based addressing was easier in languages without pointers,
| like the original FORTRAN, but it would have been more
| difficult in languages that allow pointers, like C, the
| difficulty being in avoiding to expose the internal
| representation of pointers to the programmer).
|
| Off-by-one errors are caused by mixing conventions for
| expressing indices and ranges.
|
| If you always use a consistent convention, e.g. 0-based
| indexing together with half-open intervals, where the count of
| elements equals the difference between the interval bounds,
| there are no chances for ever making off-by-one errors.
| pansa2 wrote:
| Fundamentally, CPUs use 0-based addresses. That's unavoidable.
|
| We can't choose to switch to 1-based indexing - either we use
| 0-based everywhere, or a mixture of 0-based and 1-based. Given
| the prevalence of off-by-one errors, I think the most important
| thing is to be consistent.
| dgrunwald wrote:
| When accessing individual elements, 0-based and 1-based
| indexing are basically equally usable (up to personal
| preference). But this changes for other operations! For
| example, consider how to specify the index of where to insert
| in a string. With 0-based indexing, appending is
| str.insert(str.length(), ...). With 1-based indexing, appending
| is str.insert(str.length() + 1, ...). Similarly, when it comes
| to substr()-like operations, 0-based indexing with ranges
| specified by inclusive start and exclusive end works very
| nicely, without needing any +1/-1 adjustments. Languages with
| 1-based indexing tend to use inclusive-end for substr()-like
| operations instead, but that means empty substrings now are odd
| special cases. When writing something like a text editor where
| such operations happen frequently, it's the 1-based indexing
| that ends up with many more +1/-1 in the codebase than an
| editor written with 0-based indexing.
| GuB-42 wrote:
| Because it is not how computers work. It doesn't matter much
| for high level languages like LUA, you rarely manipulate raw
| bytes and pointers, but in system programming languages like
| Zig, it matters.
|
| To use the terminology from the article, with 0-based indexing,
| offset = index * node_size. If it was 1-based, you would have
| offset = (index - 1) * node_size + 1.
|
| And it became a convention even for high level languages,
| because no matter what you prefer, inconsistency is even worse.
| An interesting case is Perl, which, in classic Perl fashion,
| lets you choose by setting the $[ variable. Most people, even
| Perl programmers consider it a terrible feature and 0-based
| indexing is used by default.
| naasking wrote:
| > Is there any reason to not just switch to 1-based indexing if
| we could? Seems like 0-based indexing really exacerbates off-
| by-one errors without much benefit
|
| You'd just get a different set of off-by-one errors with
| 1-based indexing.
| layer8 wrote:
| 1-based indexing doesn't work well as soon as you have a start
| offset within a sequence, from which you want to index. Then
| the first element is _startIndex_ + 0, not _startIndex_ + 1.
| 0-based indexing generalizes better in that way.
| cb321 wrote:
| As @SkiFire correctly observes[^1], off-by-1 problems are more
| fundamental than 0-based or 1-based indices, but the latter still
| vary enough that some kind of discrimination is needed.
|
| For many years (decades?) now, I've been using "index" for
| 0-based and "number" for 1-based as in "column index" for a
| C/Python style [ix] vs. "column number" for a shell/awk/etc.
| style $1 $2. Not sure this is the best terminology, but it _is_
| nice to have something consistent. E.g., "offset" for 0-based
| indices means "off" and even the letter "o" in some case becomes
| "the zero of some range". So, "offset" might be better than
| "index" for 0-based.
|
| [^1]: https://news.ycombinator.com/item?id=47100056
| throwaway27448 wrote:
| Ordinal is nice because it explicitly starts at 1.
| adrian_b wrote:
| Nit pick: only in few human languages the ordinal numbers
| start at 1.
|
| In most modern languages, the ordinal numbers start at 2. In
| most old languages, and also in English, the ordinal numbers
| start at 3.
|
| The reason for this is the fact that ordinal numbers have
| been created only recently, a few thousand years ago.
|
| Before that time, there were special words only for certain
| positions of a sequence, i.e. for the first and for the last
| element and sometimes also for a few elements adjacent to
| those.
|
| In English, "first", "second" and "last", are not ordinal
| numbers, but they are used for the same purpose as ordinal
| numbers, though more accurately is to say that the ordinal
| numbers are used for the same purpose with these words, as
| the ordinal numbers were added later.
|
| The ancient Indo-European languages had a special word for
| the other element of a pair, i.e. the one that is not the
| first element of a pair. This word was used for what is now
| named "second". In late Latin, the original word that meant
| "the other of a pair" has been replaced with a word meaning
| "the following", which has been eventually also taken by
| English through French in the form of "second".
| MarkusQ wrote:
| Meta nit pick: You are conflating linguist's jargon with
| mathematician's jargon.
|
| In much the same way as physicists co-opted common words
| (e.g. "work" and "energy") to mean very specific things in
| technical contexts, both linguists and mathematicians gave
| "ordinal" a specific meaning in their respective domains.
| These meanings are similar but different, and your nit pick
| is mistakenly asserting that one of these has priority over
| the other.
|
| "Ordinal" in linguistics is a word for a class of words.
| The words being classified may be old, but the use of
| "ordinal" to denote them is a comparatively modern coinage,
| roughly contemporary with the mathematicians usage. Both
| come from non-technical language describing putting things
| in an "orderly" row (c.f. cognates such as "public order",
| "court order", etc.) which did not carry the load you are
| trying to place on them.
| layer8 wrote:
| There is "zeroth" though as an ordinal humeral, which was
| already used long before computers came around, as for
| example in "the zeroth power of a number" (according to
| Merriam-Webster). So it's still not quite unambiguous. :)
| layer8 wrote:
| Not true in general, ordinal numbers start at 0:
| https://en.wikipedia.org/wiki/Ordinal_number
| matklad wrote:
| Ha! I also use `line_number = line_index + 1` convention!
| cb321 wrote:
| :-)
|
| If it helps anyone explain the SkiFire point any better, I
| like to analogize it to an I-bar cursor vs. a block cursor
| for text entry. An I-bar is unambiguously "between
| characters" while a block cursor is not. So, there are
| questions that arise for block cursors that basically never
| arise for I-bar cursors. When just looking at an integer like
| 2 or 3, there is no cursor at all. So, we must instead rely
| on names/conventions/assumptions with their attendant issues.
|
| To be clear, _I_ liked the SkiFire explanation, but having
| multiple ways to describe /think about a problem is usually
| helpful.
| navane wrote:
| I hoped to learn some more excel lookup tactics, alas
| donkeybeer wrote:
| I was thinking sql.
| jp1016 wrote:
| Banning "length" from the codebase and splitting the concept into
| count vs size is one of those things that sounds pedantic until
| you've spent an hour debugging an off-by-one in serialization
| code where someone mixed up "number of elements" and "number of
| bytes." After that you become a true believer.
|
| The big-endian naming convention (source_index, target_index
| instead of index_source, index_target) is also interesting. It
| means related variables sort together lexicographically, which
| helps with grep and IDE autocomplete. Small thing but it adds up
| when you're reading unfamiliar code.
|
| One thing I'd add: this convention is especially valuable during
| code review. When every variable that represents a byte quantity
| ends in _size and every item count ends in _count, a reviewer can
| spot dimensional mismatches almost mechanically without having to
| load the full algorithm into their head.
| maleldil wrote:
| Big-endian naming is great. I've adopted it since I first read
| it about it in matklad's blog.
| akst wrote:
| Have you got a link to this blog post?
| dd82 wrote:
| not sure about a post, but have https://github.com/tigerbee
| tle/tigerbeetle/blob/main/docs/TI... bookmarked
| layer8 wrote:
| > big-endian naming
|
| I would call it "English naming" [0], it's just more readable
| to start with, in an anglophone environment.
|
| [0] as opposed to "naming, English", I suppose ;)
| Shish2k wrote:
| > When every variable that represents a byte quantity ends in
| _size and every item count ends in _count, a reviewer can spot
| dimensional mismatches almost mechanically
|
| At that point I'd rather make them separate data types, and
| have the compiler spot mismatches actually-mechanically o.o
| jkaptur wrote:
| Canonical essay on this sort of technique:
| https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...
| zahlman wrote:
| I've always understood "length" to mean what the author calls
| "count", and would never expect it to refer to byte size; as
| far as I can tell, it _never_ did. Size is a design-time
| consideration; caring about it in the code is an exceptional
| case, for applications like (as you mention) serialization. So
| that 's what deserves the dedicated term. "Length" refers
| specifically to a total number of elements in many languages
| preceding Rust.
|
| For that matter, many languages, especially "object-oriented"
| ones, treat heterogeneous containers as the default. They might
| not even offer native containers that can store everything
| inline in a single contiguous allocation, except perhaps for
| strings. In which case, "number of bytes" is itself ambiguous;
| are you including the indirected objects or not?
|
| "Count" is _also_ overloaded -- it commonly means, and I
| normally only understand it to mean, the number of elements in
| a collection _meeting some condition_. Hence the `.count`
| method of Python sequences, as well as the jargon "population
| count" referring to the number of set bits in an integer.
| Today, Python's integers have both a `.bit_count` and a
| `.bit_length`, and it's obvious what both of them do; calling
| either `.bit_size` would be confusing in my mental framework,
| and a contradiction in terms in the OP's.
|
| I would disagree that _even C 's `strlen`_ refers to byte size.
| C comes from a pre-Unicode world; the type is called `char`
| because that was naively considered sufficient at the time to
| represent a text character. (Unicode is still in that sense
| naive, but it at least allows for systems that are acutely
| aware of the distinction between "characters" and graphemes.)
| But notice: C's "strings" aren't proper objects; they're null-
| terminated sequences, i.e. their length is signaled in-band. So
| that metadata is also just part of the data, in a single
| allocation with no indirection; the "size" of a string could
| only reasonably be interpreted to include that null terminator.
| Yet the result of `strlen` excludes it! Further, if `strlen` is
| used on a string that was placed within some allocated buffer,
| it knows nothing about that buffer.
|
| (Similarly, Rust `str::len` is properly named by this scheme.
| It gives the number of valid 1-byte-sized elements in a
| collection, _not_ the byte size of the buffer they 're stored
| within. It's still ambiguous in a sense, but that's because of
| the convention of using UTF-8 to create an abstraction of
| "character" elements of non-uniform size. This kind of
| ambiguity is properly resolved either with iterators, like the
| `Chars` iterator in Rust, or with views.)
|
| Also consider: C has a `sizeof` operator, influencing Python's
| `.__sizeof__()` methods. That's because the concept of "size"
| equally makes sense for _non_ -sequences; _neither "count" nor
| "length"_ does. So _of course_ "length" cannot mean what the
| author calls "size".
| stephc_int13 wrote:
| Could not approve more a I use near identical naming convention
| in my C codebase. Not using the standard C library to avoid
| inconsistencies and the awful naming habits of that era.
| kgwxd wrote:
| So many arguments could have been avoided if the convention was
| to use o instead of i in c-like for loops.
| matheusmoreira wrote:
| Really like this. I'll follow this practice from now on.
___________________________________________________________________
(page generated 2026-02-21 23:02 UTC)