[HN Gopher] Words about Arrays and Tables
___________________________________________________________________
Words about Arrays and Tables
Author : todsacerdoti
Score : 52 points
Date : 2025-07-30 15:15 UTC (7 hours ago)
(HTM) web link (buttondown.com)
(TXT) w3m dump (buttondown.com)
| mvc wrote:
| XSLT/XPath is an example of a platform that provides multiple
| axis through which to access your data structure.
|
| https://developer.mozilla.org/en-US/docs/Web/XML/XPath/Refer...
| kccqzy wrote:
| I'm not sure what's the insight in this article here. All the
| things mentioned seem pretty straightforward. But I'll comment on
| a few:
|
| > Really any finite set can be a "dimension".
|
| This is absolutely true and yet typical programming languages
| don't do a good job of unleashing this. When is the last time you
| wish you could define an array where the indices are from a
| custom finite set, perhaps a range 100 to 200, perhaps an enum? I
| only know of two languages, Pascal and Haskell, that are
| sufficiently flexible. All other languages insist that indices
| for arrays must be integers either starting from 0 or 1. Haskell
| has a mechanism to let you control what is an array index:
| https://hackage.haskell.org/package/base-4.18.1.0/docs/Data-...
| The lack of this ability makes programmers use more general hash
| maps by default and therefore leave performance on the table.
|
| > We can also transform Row -> PowerSet(Col) into Row -> Col ->
| Bool, aka a boolean matrix.
|
| I mean sure. Does the author know type signatures and their
| operations like this are isomorphic to algebraic operations on
| size of the possibly infinite sets? The function A -> B has
| exactly |B|^|A| implementations so that's why Col -> Bool is in
| this sense "same" as PowerSet(Col). And the function operator
| associates to the right so adding Row -> doesn't change anything.
| Wait till you learn why sum types are called sum types and why
| product types are called product types. Here's another thing that
| might be mind-blowing: arrays of an unknown length can be thought
| of as the infinite union of arrays of lengths of all natural
| numbers: Array<T, 0> | Array<T, 1> | ... so using the above
| notation the cardinality is 1 + |T| + |T|^2 + ...; but they can
| also be represented as a standard linked list List<T>=Nil|(T,
| List<T>) and using the above notation we have
| |List<T>|=1+|T|*|List<T>| which simplifies to
| |List<T>|=1/(1-|T|). And guess what, the Taylor expansion of that
| is just the earlier 1 + |T| + |T|^2 + ... precisely.
| Jtsummers wrote:
| > I only know of two languages, Pascal and Haskell, that are
| sufficiently flexible.
|
| Julia, Fortran, some BASICs allow custom integer ranges.
|
| Ada allows you to use any discrete range (integer, character,
| enums).
|
| https://en.wikipedia.org/wiki/Comparison_of_programming_lang...
| - A more comprehensive list
|
| The two columns of interest are "Specifiable index type" and
| "Specifiable base index".
| cestith wrote:
| Older versions of Perl allowed you to set a lexically-scoped
| array base integer. It defaulted to zero and the docs
| mentioned 1-based arrays, but IIRC it could be any integer.
| This was deprecated several years ago in 5.12 as a harmful
| practice. Specifying using features of a Perl version of 5.16
| or newer actually makes assigning to it (except for 0) a
| compile-time error.
|
| Interestingly, though, Perl's native arrays are decidedly not
| an array of primitives. They are arrays of scalar values, and
| a scalar value can not only contain a character, a string, an
| integer, a float, or a reference to another item (scalar,
| hash, array, blessed object, filehandle, etc) but it can
| return different values when called in a numeric context vs a
| string context. There are certainly ways to get at primitive
| values, but it's not in Perl's native array semantics.
| pklausler wrote:
| Non-default lower bounds in Fortran are a famous pitfall,
| however; they don't persist in many cases where one might
| think that they do, and are not 100% portable across
| compilers.
| zahlman wrote:
| > When is the last time you wish you could define an array
| where the indices are from a custom finite set, perhaps a range
| 100 to 200, perhaps an enum?
|
| > ...
|
| > The lack of this ability makes programmers use more general
| hash maps by default and therefore leave performance on the
| table.
|
| You can simulate this in any language by hash-mapping the
| custom set elements to indices. (And in the general case, you
| won't be able to determine the indices faster than a hash table
| lookup.)
|
| In fact, I imagine there are languages and implementations
| where hash table lookup is the fastest way to map 'a':'z' to
| 0:25.
|
| In fact, let me try a (surely totally bogus) micro-benchmark
| right now: $ python -m timeit --setup 'lookup =
| dict(zip("abcdefghijklmnopqrstuvwxyz", range(26)))' '[lookup[x]
| for x in "abcdefghijklmnopqrstuvwxyz"]' 500000 loops,
| best of 5: 884 nsec per loop $ python -m timeit '[ord(x)
| - 97 for x in "abcdefghijklmnopqrstuvwxyz"]' 200000
| loops, best of 5: 1.21 usec per loop
|
| Huh.
|
| ...And surely if you use the hash map directly, instead of
| using it to grab an index into another container, overall
| performance only gets better (unless maybe you need to save
| per-instance memory, and you have multiple containers using the
| same custom index scheme).
|
| Hash tables are a pretty neat idea.
| galaxyLogic wrote:
| The characteristic feature of Arays is that all but the last
| element have their next element and all but the first element
| have their previous element. So what would be the value of having
| or using indexes other than Integers to index Arrays?
|
| You can also define integer-valued named constants, and use those
| if you prefer.
| larrik wrote:
| That sounds more like a linked list than an array.
|
| In C, an array is just contiguous bytes, and you reach an index
| through math (starting location + (index * size)).
|
| Pretty sure lots of languages still do this to some degree, as
| your lookup times are effectively zero, though altering the
| number of elements is way more complex/expensive than in a
| linked list.
| deepsun wrote:
| > and you reach an index through math (starting location +
| (index * size)).
|
| By the way, that's the only reason why C-derived languages
| use unintuitive zero-based indexing. There's really no other
| reason to call the first element a[0] instead of a[1].
| dpassens wrote:
| Ah, but is it unintuitive though? Once one understands that
| in the vast majority of cases, C deals in pointers, not
| arrays, it makes perfect sense that indexing uses an offset
| rather than the nth element.
| tom_ wrote:
| The index of the first element is always 0 plus the index
| of the first element. This means it sort-of doesn't matter
| what the index of the first element of the array is - but,
| viewed another way, it's an excellent argument for 0.
| Starting at 0 means you don't need to add a constant in the
| (very common) case of dealing with a subarray that starts
| at the first element of the containing array.
| Jtsummers wrote:
| > So what would be the value of having or using indexes other
| than Integers to index Arrays?
|
| It removes a level of indirection, wasted space, and mental
| overhead. Suppose you have some table of 'a'..'z' -> whatever.
| What are your options?
|
| - Switch/case. This may be optimized and fast, but that's not
| guaranteed across all languages and compilers.
|
| - Hash map. This introduces memory overhead even if it is
| technically O(1) for access times.
|
| - If in C, you could do: T* table =
| (T*)malloc(sizeof(T) * 26) - 'a'
|
| So that the characters can be used naturally without later
| modification (not a feature in every language, though).
|
| - Manually calculate each offset every time:
| table[c - 'a'] = ...
|
| With language appropriate type conversions, if needed.
|
| Or you can allow any custom range to work, and do the math in
| that last option automatically because it's trivial for a
| compiler to figure it out and it removes unnecessary mental
| overhead from the program reader and writer.
| table['a']
|
| Just works.
|
| > You can also define integer-valued named constants, and use
| those if you prefer.
|
| Because everyone wants to see `lower_case_a = 0` and the 25
| other cases in their code along with all other possible
| constants. At least use an enum, sane languages even make them
| type-safe (or at least safer).
| mcphage wrote:
| > The characteristic feature of Arays is that all but the last
| element have their next element and all but the first element
| have their previous element.
|
| Where are you getting that from? That's not the characteristic
| feature of Arrays, that's the characteristic feature of Doubly
| Linked Lists.
| zahlman wrote:
| I wouldn't call it a characteristic feature of either. Any
| ordered, finite sequence of elements has those properties.
| Doubly linked lists simply offer O(1) access to "next
| elements" and "previous elements" given an element.
| mpweiher wrote:
| This might be interesting:
|
| _Structuring Arrays with Algebraic Shapes_
|
| https://dl.acm.org/doi/abs/10.1145/3736112.3736141
|
| https://news.ycombinator.com/item?id=44399757
| nyrikki wrote:
| One thing that may help is to drop the typing for generic words
| or pidgin holes, that will help with the FORTRAN example.
|
| The IBM 704 had 3 index registers, which could be combined
| together with a bitwise OR, then was _subtracted_ from a base
| address, that 's where FORTRAN for its 7D arrays from.
|
| When you mix that concept of decrementing index registers with
| hypercubes, you will naturally get to the extract format of BI
| tools like classic Tableau.
| zahlman wrote:
| 2000 words, specifically. Thanks again, title filter. Even just
| "Arrays and Tables" would have been better.
|
| I sincerely believe this aspect of the filter is a misfeature.
| Submissions that have bad reasons to put a number in the title
| are generally submissions that are just bad (and should be
| flagged) anyway.
| Animats wrote:
| I get the feeling this guy likes Lua. Lua combines arrays and
| dicts into "tables", which are usually indexed from 1 but can be
| indexed by any type as a dict. He never mentions Lua, though.
|
| Multidimensional arrays are a blind spot for modern language
| designers. Not arrays of arrays, as this author points out.
| FORTRAN had multidimensional arrays, but C didn't. Go and Rust
| don't have them. Part of the trouble is slices. Once you have
| slices, people want slicing in multidimensional arrays along any
| axis, which complicates the basic array representation.
| Discussions in this area dissolved into bikeshedding and nothing
| got done for Go and Rust.
| zahlman wrote:
| > Lua combines arrays and dicts into "tables", which are
| usually indexed from 1 but can be indexed by any type as a
| dict.
|
| I think it's more accurate to just call them hash tables
| ("dicts") that happen to support some very limited array-like
| functionality. "Appending" still looks like key insertion if
| you don't use a named method
| (https://stackoverflow.com/questions/27434142); `ipairs` misses
| those non-integer keys but also misses non-contiguous keys, _or
| even a zero key_
| (https://stackoverflow.com/questions/55108794); the `#`
| operator is not well defined once you add those non-contiguous
| keys (https://stackoverflow.com/questions/2705793) (so you
| can't actually cleanly override the choice to "start from 1");
| slicing etc. aren't provided as operators
| (https://stackoverflow.com/questions/24821045); etc.
| layer8 wrote:
| C and C++ do have multidimensional arrays [0], in the sense
| that they are arrays of arrays where the inner arrays all have
| the same fixed size, encoded into the multidimensional array
| type. Likewise in Go and Rust. This is different from, say,
| Java, where if you have an array of arrays, the inner arrays
| can all have different sizes, and there is no array type that
| could express that they shouldn't.
|
| Being able to express slices across arbitrary dimensions
| natively is another matter.
|
| [0]
| https://en.cppreference.com/w/cpp/language/array.html#Multid...
| marcosdumay wrote:
| > Go and Rust
|
| The trade-offs are way too severe for people to agree to any
| implementation in a low level language like Rust or one that
| wants to be low level like Go.
|
| We could get them in something like Python.
___________________________________________________________________
(page generated 2025-07-30 23:01 UTC)