[HN Gopher] Words about Arrays and Tables
       ___________________________________________________________________
        
       Words about Arrays and Tables
        
       Author : todsacerdoti
       Score  : 52 points
       Date   : 2025-07-30 15:15 UTC (7 hours ago)
        
 (HTM) web link (buttondown.com)
 (TXT) w3m dump (buttondown.com)
        
       | mvc wrote:
       | XSLT/XPath is an example of a platform that provides multiple
       | axis through which to access your data structure.
       | 
       | https://developer.mozilla.org/en-US/docs/Web/XML/XPath/Refer...
        
       | kccqzy wrote:
       | I'm not sure what's the insight in this article here. All the
       | things mentioned seem pretty straightforward. But I'll comment on
       | a few:
       | 
       | > Really any finite set can be a "dimension".
       | 
       | This is absolutely true and yet typical programming languages
       | don't do a good job of unleashing this. When is the last time you
       | wish you could define an array where the indices are from a
       | custom finite set, perhaps a range 100 to 200, perhaps an enum? I
       | only know of two languages, Pascal and Haskell, that are
       | sufficiently flexible. All other languages insist that indices
       | for arrays must be integers either starting from 0 or 1. Haskell
       | has a mechanism to let you control what is an array index:
       | https://hackage.haskell.org/package/base-4.18.1.0/docs/Data-...
       | The lack of this ability makes programmers use more general hash
       | maps by default and therefore leave performance on the table.
       | 
       | > We can also transform Row -> PowerSet(Col) into Row -> Col ->
       | Bool, aka a boolean matrix.
       | 
       | I mean sure. Does the author know type signatures and their
       | operations like this are isomorphic to algebraic operations on
       | size of the possibly infinite sets? The function A -> B has
       | exactly |B|^|A| implementations so that's why Col -> Bool is in
       | this sense "same" as PowerSet(Col). And the function operator
       | associates to the right so adding Row -> doesn't change anything.
       | Wait till you learn why sum types are called sum types and why
       | product types are called product types. Here's another thing that
       | might be mind-blowing: arrays of an unknown length can be thought
       | of as the infinite union of arrays of lengths of all natural
       | numbers: Array<T, 0> | Array<T, 1> | ... so using the above
       | notation the cardinality is 1 + |T| + |T|^2 + ...; but they can
       | also be represented as a standard linked list List<T>=Nil|(T,
       | List<T>) and using the above notation we have
       | |List<T>|=1+|T|*|List<T>| which simplifies to
       | |List<T>|=1/(1-|T|). And guess what, the Taylor expansion of that
       | is just the earlier 1 + |T| + |T|^2 + ... precisely.
        
         | Jtsummers wrote:
         | > I only know of two languages, Pascal and Haskell, that are
         | sufficiently flexible.
         | 
         | Julia, Fortran, some BASICs allow custom integer ranges.
         | 
         | Ada allows you to use any discrete range (integer, character,
         | enums).
         | 
         | https://en.wikipedia.org/wiki/Comparison_of_programming_lang...
         | - A more comprehensive list
         | 
         | The two columns of interest are "Specifiable index type" and
         | "Specifiable base index".
        
           | cestith wrote:
           | Older versions of Perl allowed you to set a lexically-scoped
           | array base integer. It defaulted to zero and the docs
           | mentioned 1-based arrays, but IIRC it could be any integer.
           | This was deprecated several years ago in 5.12 as a harmful
           | practice. Specifying using features of a Perl version of 5.16
           | or newer actually makes assigning to it (except for 0) a
           | compile-time error.
           | 
           | Interestingly, though, Perl's native arrays are decidedly not
           | an array of primitives. They are arrays of scalar values, and
           | a scalar value can not only contain a character, a string, an
           | integer, a float, or a reference to another item (scalar,
           | hash, array, blessed object, filehandle, etc) but it can
           | return different values when called in a numeric context vs a
           | string context. There are certainly ways to get at primitive
           | values, but it's not in Perl's native array semantics.
        
           | pklausler wrote:
           | Non-default lower bounds in Fortran are a famous pitfall,
           | however; they don't persist in many cases where one might
           | think that they do, and are not 100% portable across
           | compilers.
        
         | zahlman wrote:
         | > When is the last time you wish you could define an array
         | where the indices are from a custom finite set, perhaps a range
         | 100 to 200, perhaps an enum?
         | 
         | > ...
         | 
         | > The lack of this ability makes programmers use more general
         | hash maps by default and therefore leave performance on the
         | table.
         | 
         | You can simulate this in any language by hash-mapping the
         | custom set elements to indices. (And in the general case, you
         | won't be able to determine the indices faster than a hash table
         | lookup.)
         | 
         | In fact, I imagine there are languages and implementations
         | where hash table lookup is the fastest way to map 'a':'z' to
         | 0:25.
         | 
         | In fact, let me try a (surely totally bogus) micro-benchmark
         | right now:                 $ python -m timeit --setup 'lookup =
         | dict(zip("abcdefghijklmnopqrstuvwxyz", range(26)))' '[lookup[x]
         | for x in "abcdefghijklmnopqrstuvwxyz"]'       500000 loops,
         | best of 5: 884 nsec per loop       $ python -m timeit '[ord(x)
         | - 97 for x in "abcdefghijklmnopqrstuvwxyz"]'       200000
         | loops, best of 5: 1.21 usec per loop
         | 
         | Huh.
         | 
         | ...And surely if you use the hash map directly, instead of
         | using it to grab an index into another container, overall
         | performance only gets better (unless maybe you need to save
         | per-instance memory, and you have multiple containers using the
         | same custom index scheme).
         | 
         | Hash tables are a pretty neat idea.
        
       | galaxyLogic wrote:
       | The characteristic feature of Arays is that all but the last
       | element have their next element and all but the first element
       | have their previous element. So what would be the value of having
       | or using indexes other than Integers to index Arrays?
       | 
       | You can also define integer-valued named constants, and use those
       | if you prefer.
        
         | larrik wrote:
         | That sounds more like a linked list than an array.
         | 
         | In C, an array is just contiguous bytes, and you reach an index
         | through math (starting location + (index * size)).
         | 
         | Pretty sure lots of languages still do this to some degree, as
         | your lookup times are effectively zero, though altering the
         | number of elements is way more complex/expensive than in a
         | linked list.
        
           | deepsun wrote:
           | > and you reach an index through math (starting location +
           | (index * size)).
           | 
           | By the way, that's the only reason why C-derived languages
           | use unintuitive zero-based indexing. There's really no other
           | reason to call the first element a[0] instead of a[1].
        
             | dpassens wrote:
             | Ah, but is it unintuitive though? Once one understands that
             | in the vast majority of cases, C deals in pointers, not
             | arrays, it makes perfect sense that indexing uses an offset
             | rather than the nth element.
        
             | tom_ wrote:
             | The index of the first element is always 0 plus the index
             | of the first element. This means it sort-of doesn't matter
             | what the index of the first element of the array is - but,
             | viewed another way, it's an excellent argument for 0.
             | Starting at 0 means you don't need to add a constant in the
             | (very common) case of dealing with a subarray that starts
             | at the first element of the containing array.
        
         | Jtsummers wrote:
         | > So what would be the value of having or using indexes other
         | than Integers to index Arrays?
         | 
         | It removes a level of indirection, wasted space, and mental
         | overhead. Suppose you have some table of 'a'..'z' -> whatever.
         | What are your options?
         | 
         | - Switch/case. This may be optimized and fast, but that's not
         | guaranteed across all languages and compilers.
         | 
         | - Hash map. This introduces memory overhead even if it is
         | technically O(1) for access times.
         | 
         | - If in C, you could do:                 T* table =
         | (T*)malloc(sizeof(T) * 26) - 'a'
         | 
         | So that the characters can be used naturally without later
         | modification (not a feature in every language, though).
         | 
         | - Manually calculate each offset every time:
         | table[c - 'a'] = ...
         | 
         | With language appropriate type conversions, if needed.
         | 
         | Or you can allow any custom range to work, and do the math in
         | that last option automatically because it's trivial for a
         | compiler to figure it out and it removes unnecessary mental
         | overhead from the program reader and writer.
         | table['a']
         | 
         | Just works.
         | 
         | > You can also define integer-valued named constants, and use
         | those if you prefer.
         | 
         | Because everyone wants to see `lower_case_a = 0` and the 25
         | other cases in their code along with all other possible
         | constants. At least use an enum, sane languages even make them
         | type-safe (or at least safer).
        
         | mcphage wrote:
         | > The characteristic feature of Arays is that all but the last
         | element have their next element and all but the first element
         | have their previous element.
         | 
         | Where are you getting that from? That's not the characteristic
         | feature of Arrays, that's the characteristic feature of Doubly
         | Linked Lists.
        
           | zahlman wrote:
           | I wouldn't call it a characteristic feature of either. Any
           | ordered, finite sequence of elements has those properties.
           | Doubly linked lists simply offer O(1) access to "next
           | elements" and "previous elements" given an element.
        
       | mpweiher wrote:
       | This might be interesting:
       | 
       |  _Structuring Arrays with Algebraic Shapes_
       | 
       | https://dl.acm.org/doi/abs/10.1145/3736112.3736141
       | 
       | https://news.ycombinator.com/item?id=44399757
        
       | nyrikki wrote:
       | One thing that may help is to drop the typing for generic words
       | or pidgin holes, that will help with the FORTRAN example.
       | 
       | The IBM 704 had 3 index registers, which could be combined
       | together with a bitwise OR, then was _subtracted_ from a base
       | address, that 's where FORTRAN for its 7D arrays from.
       | 
       | When you mix that concept of decrementing index registers with
       | hypercubes, you will naturally get to the extract format of BI
       | tools like classic Tableau.
        
       | zahlman wrote:
       | 2000 words, specifically. Thanks again, title filter. Even just
       | "Arrays and Tables" would have been better.
       | 
       | I sincerely believe this aspect of the filter is a misfeature.
       | Submissions that have bad reasons to put a number in the title
       | are generally submissions that are just bad (and should be
       | flagged) anyway.
        
       | Animats wrote:
       | I get the feeling this guy likes Lua. Lua combines arrays and
       | dicts into "tables", which are usually indexed from 1 but can be
       | indexed by any type as a dict. He never mentions Lua, though.
       | 
       | Multidimensional arrays are a blind spot for modern language
       | designers. Not arrays of arrays, as this author points out.
       | FORTRAN had multidimensional arrays, but C didn't. Go and Rust
       | don't have them. Part of the trouble is slices. Once you have
       | slices, people want slicing in multidimensional arrays along any
       | axis, which complicates the basic array representation.
       | Discussions in this area dissolved into bikeshedding and nothing
       | got done for Go and Rust.
        
         | zahlman wrote:
         | > Lua combines arrays and dicts into "tables", which are
         | usually indexed from 1 but can be indexed by any type as a
         | dict.
         | 
         | I think it's more accurate to just call them hash tables
         | ("dicts") that happen to support some very limited array-like
         | functionality. "Appending" still looks like key insertion if
         | you don't use a named method
         | (https://stackoverflow.com/questions/27434142); `ipairs` misses
         | those non-integer keys but also misses non-contiguous keys, _or
         | even a zero key_
         | (https://stackoverflow.com/questions/55108794); the `#`
         | operator is not well defined once you add those non-contiguous
         | keys (https://stackoverflow.com/questions/2705793) (so you
         | can't actually cleanly override the choice to "start from 1");
         | slicing etc. aren't provided as operators
         | (https://stackoverflow.com/questions/24821045); etc.
        
         | layer8 wrote:
         | C and C++ do have multidimensional arrays [0], in the sense
         | that they are arrays of arrays where the inner arrays all have
         | the same fixed size, encoded into the multidimensional array
         | type. Likewise in Go and Rust. This is different from, say,
         | Java, where if you have an array of arrays, the inner arrays
         | can all have different sizes, and there is no array type that
         | could express that they shouldn't.
         | 
         | Being able to express slices across arbitrary dimensions
         | natively is another matter.
         | 
         | [0]
         | https://en.cppreference.com/w/cpp/language/array.html#Multid...
        
         | marcosdumay wrote:
         | > Go and Rust
         | 
         | The trade-offs are way too severe for people to agree to any
         | implementation in a low level language like Rust or one that
         | wants to be low level like Go.
         | 
         | We could get them in something like Python.
        
       ___________________________________________________________________
       (page generated 2025-07-30 23:01 UTC)