hngopher.com

       [HN Gopher] What about K?
       ___________________________________________________________________
        
       What about K?
        
       Author : tosh
       Score  : 167 points
       Date   : 2025-02-10 12:51 UTC (10 hours ago)
        
 (HTM) web link (xpqz.github.io)
 (TXT) w3m dump (xpqz.github.io)
        
       | sebg wrote:
       | A companion guide that I always recommend if interested in K is:
       | Q for mortals, found here - https://code.kx.com/q4m3/
       | 
       | Note, from wikipedia: Q serves as the query language for kdb+, a
       | disk based and in-memory, column-based database. Kdb+ is based on
       | the language k, a terse variant of the language APL. Q is a thin
       | wrapper around k, providing a more readable, English-like
       | interface.
        
         | mwexler wrote:
         | Pulled from above:                 Coding Style The q gods have
         | no need for explanatory error messages or comments since their
         | q code is perfect and self-documenting. Even experienced
         | mortals spend hours poring over cryptic q error messages such
         | as the ones above. Moreover, many mortals eschew comments in
         | misanthropic coding macho. Don't.
         | 
         | A more enjoyable read than the parent post.
        
         | nialv7 wrote:
         | There's also Nial: https://github.com/danlm/qnial7 which is
         | (pardon the oversimplification) APL but with words instead of
         | symbols.
        
         | steveBK123 wrote:
         | A set of links with good examples of common problems solved in
         | Q
         | 
         | https://code.kx.com/phrases/wikipage/
         | 
         | https://code.kx.com/q/kb/programming-idioms/
         | 
         | https://code.kx.com/phrases/
        
       | kvdveer wrote:
       | The linked document only contains a warning about how versioning
       | is weird, and a description of the syntax. No examples beyond
       | trivial one-liners.
       | 
       | What problem is K trying to solve? What does a K program look
       | like?
        
         | sz4kerto wrote:
         | Absolutely not being sarcastic: one problem it solves is that
         | it is very hard to read as a beginner, so it can be
         | intimidating (although it becomes much easier to read a bit
         | later). This, coupled with the general arrogance of k/q
         | practitioners (again, not really saying this in a negative way)
         | and that k, kdb, etc. deliberately doesn't give you guardrails
         | makes people who write k/q seem a bit 'mythical' and make them
         | feel very clever.
         | 
         | So I think k, q and kdb are fun to work with, but one of the
         | major components of its success is that it allowed a community
         | (in finance) to evolve that can earn 50-150% more than their
         | peer groups who do the same work in Java or C++. 10 years ago a
         | kx course cost $1500 per person per day.
        
           | pjmlp wrote:
           | To note that those are typical prices for enterprise level
           | certifications, including some products that some Java or C++
           | devs might need to interact with, when working on those kind
           | of environments.
        
         | bear8642 wrote:
         | K is a fast vector language, used (primarily) for time series
         | data analysis.
         | 
         | >What does a K program look like?
         | 
         | You might want to check out
         | https://news.ycombinator.com/item?id=40335921
         | 
         | beagle3 and geocar both have various comments you might want to
         | search for.
        
           | mananaysiempre wrote:
           | > a fast vector language
           | 
           | With an Oracle-style DeWitt clause[1] prohibiting public
           | benchmarks.
           | 
           | [1]
           | https://mlochbaum.github.io/BQN/implementation/kclaims.html
        
             | rustc wrote:
             | Shakti (the latest K implementation by the author of K)
             | claims [1] to load a 50gb csv in 1.6 seconds which
             | according to them takes 265 seconds with Polars. Has anyone
             | independently verified these claims? Is Polars really
             | leaving 2 orders of magnitude performance on the table?
             | 
             | [1]: https://shakti.com/ -> Compare -> h2o.k
        
               | bear8642 wrote:
               | > [1]: https://shakti.com/ -> Compare -> h2o.k
               | 
               | You can link to the subsections:
               | https://shakti.com/compare/h2o.k
        
               | orlp wrote:
               | Disclaimer: I work for Polars inc.
               | 
               | As a sanity check I just cloned
               | https://github.com/h2oai/db-benchmark, ran the data
               | generation script and ran on a 64 core AMD EPYC (AWS
               | c7a.16xlarge):                   import polars as pl
               | lf = pl.scan_csv("G1_1e9_1e2_0_0.csv")
               | print(lf.select(pl.col.v1.sum()).collect())
               | 
               | The above script ran in 7.58 seconds.
               | 
               | If I change the collect() to collect(new_streaming=True)
               | to use the new streaming engine I've been working on, it
               | runs in 6.90 seconds.
               | 
               | I can't realistically time the full "read CSV to memory"
               | with this 50 GB file on this machine as we start swapping
               | (this machine has 128GiB memory) and/or evicting data
               | from disk cache (this machine has a slow EC2 SSD attached
               | to it), so we do have a blow-up of memory usage (which
               | could be as simple as loading small integers into an
               | 8-byte Uint64 column). I think it's likely that on K's
               | machine the "read full CSV to memory" approach also
               | started swapping, giving the large runtime. However, in
               | Polars you'd typically write your query using LazyFrames,
               | which means we don't actually have to load the full CSV
               | into memory.
               | 
               | EDIT: running on a m7a.16xlarge with twice the memory
               | (256GiB) once the CSV file is in disk cache Polars can
               | parse the full CSV file into an in-memory dataframe in
               | 7.68 seconds.
               | 
               | K's claim that it parses the full 50GB CSV in 1.6 seconds
               | if true is very impressive regardless.
        
               | mananaysiempre wrote:
               | Honestly 7 seconds even just to parse the CSV is already
               | pretty impressive, 7GB/s would be simdjson speeds if you
               | did it on a single core. Do you have a single-threaded
               | parser with really well-tuned SIMD, or a speculative
               | parallel one, or ..?
        
               | orlp wrote:
               | We have a single-threaded chunker that scans serially
               | over the file. This chunker exclusively finds unquoted
               | newlines (using SIMD) to find clean parallelization
               | boundaries, it doesn't do any further parsing. Those
               | parallelization boundaries are then used to feed worker
               | threads chunks of data to properly parse into our in-
               | memory representation (which mostly follows Arrow).
        
               | LegionMammal978 wrote:
               | Would you know how much of the total runtime is devoted
               | to the initial chunking process? Amdahl's law would
               | prefer an entirely speculative approach in the limit, but
               | I could imagine that the 2x overhead might not be worth
               | it for reasonable file sizes and core counts.
               | 
               | (But even then, 1.6 s would be quite a feat. It makes me
               | wonder if the K implementation is partially lazy, as you
               | say typical Polars usage is.)
        
               | orlp wrote:
               | It seems from a profile that on the eager engine the
               | serial scanner is able to feed ~32 threads worth of
               | decoding: https://share.firefox.dev/4hS1eJa.
               | 
               | It might be worth speculating, or at least optimizing the
               | serial chunker more. You could theoretically start a
               | second serial chunker from the end working backwards but
               | that would not be wise with our ordered streams, as the
               | decoded data would have to be buffered for a long time.
               | 
               | Similarly on the new streaming engine, each thread is
               | active ~half of the time, except the thread running the
               | chunking task: https://share.firefox.dev/3WQV9og.
               | 
               | Note that in a lot of realistic workloads on the
               | streaming engine compute can happen in between decodes,
               | completely hiding the bottleneck. Also all of the above
               | is with the file being completely in file cache, if fed
               | from a slow SSD it's not a bottleneck whatsoever.
        
         | swiftcoder wrote:
         | This is kind of the problem with every introductory text to an
         | APL-family language.
         | 
         | I get the idea that one either already knows one needs an array
         | programming language, or doesn't grok why anyone would need one
        
         | reedf1 wrote:
         | K solves the problem of bank account for two groups of people,
         | kX Systems and quants.
        
         | FjordWarden wrote:
         | I've only played around with k and APL in my spare time so I
         | can't speak to real world problems. It is a ridiculously
         | powerful query language, where in SQL you have only started
         | writing `SELECT ...`, in k you are already done. But you need
         | to have very good tacit knowledge of algorithms and the weird
         | syntax to be productive, like oh I need to calculate an
         | integral-image of this time-series, but that just a pre-scan
         | over addition, boom and you are done. The theory of array
         | programming with a focus in combinators is also an interesting
         | perspective on functional programming. IMHO not something you
         | should write full program in, but that hasn't stopped some from
         | trying.
        
           | bee_rider wrote:
           | This was a helpful comment. After the article, the question
           | that popped into my head was... so ok should I try and
           | compare this to like BLAS or something like Jax?
           | 
           | But, this sort of language is more about writing and reading
           | from the disk efficiently, right? I guess SIMD type
           | optimizations would be less of a thing.
        
             | FjordWarden wrote:
             | I think that array languages have historically used memory
             | mapped files for IO, and treat them like a big data frame,
             | but other versions also support streaming IO. Its up to the
             | implementers of the runtime to use SIMD instructions if
             | they deem this optimal but not something you would use
             | yourself.
        
           | Pet_Ant wrote:
           | I feel like measuring things in characters is not meaningful,
           | but only in tokens. Replacing "SELECT" with "SEL" would not
           | improve SQL in the slightest.
        
         | Thorrez wrote:
         | A one-liner in k tends to be equivalent to a much larger
         | program in another language.
         | 
         | Here's a program in k. I'm not sure exactly what it does. I
         | think it might be a json encoder/decoder:
         | 
         | https://github.com/KxSystems/kdb/blob/master/e/json.k
        
           | cubefox wrote:
           | It appears you accidentally linked to log where someone fell
           | on his keyboard.
        
             | andai wrote:
             | I think Whitney's greatest achievement isn't even any of
             | his languages--though they are very impressive--but that he
             | convinced banks to pay him millions of dollars to write
             | IOCCC style code!
        
           | bregma wrote:
           | Dialup modems on a bad connection used to generate more
           | readable code.
        
           | saghm wrote:
           | It says a lot that the name of the file for is more
           | informative about what the code does than the entirety of the
           | file itself. "Readability is a property of the reader"
           | indeed, but also the writer...
        
         | poulpy123 wrote:
         | The problem solved by K is the long-term employment of people
         | writing K. You can't be fired if you're the only one
         | understanding more or less the codebase
        
           | dboreham wrote:
           | This is true about more software development than you
           | realize.
        
       | z5h wrote:
       | > Readability is a property of the reader, not the language.
       | 
       | Similarly, the inability of a person to write machine code
       | directly is a property of the person, not the hardware. Yet some
       | of these people admit their limitations and use K.
        
         | pyrale wrote:
         | Silicon computers are a crutch for the people too flawed to run
         | their calculations in their head.
        
           | Ygg2 wrote:
           | "It is by will alone I set my mind in motion. It is by the
           | juice of Sapho that the thoughts acquire speed, the lips
           | acquire stains the stains become a warning. It is by will
           | alone I set my mind it motion..."
        
         | psychoslave wrote:
         | Do you mean that not everyone use butterfly yet? How quaint!
        
           | coder543 wrote:
           | Butterflies? Easier to catch a caterpillar and set it on the
           | right path to one day flap its wings in just the desired way.
        
             | tempodox wrote:
             | Let me sell you an AI that genetically engineers
             | caterpillars to do just that.
        
               | jagged-chisel wrote:
               | But can it also provide instructions to set the universal
               | constants at the start so the universe evolves the
               | desired result?
        
               | tempodox wrote:
               | I just asked it and it said yes.
        
               | gpderetta wrote:
               | Let there be light!
        
             | Ygg2 wrote:
             | Caterpillar? Too high tech. I just restart the universe
             | until the problem is pre-solved.
        
         | camdv wrote:
         | Chinese has a readability issue to the English speaker. That
         | doesn't mean it's not readable.
        
           | bobthepanda wrote:
           | The original comment is a joke
        
             | gitonthescene wrote:
             | This comment would seem to address the point of that joke
        
       | hcfman wrote:
       | I build a language called K for my masters thesis in 1984. Who
       | was first ?
        
         | mlochbaum wrote:
         | You win! Whitney was just out of graduate school at the time,
         | and had worked some with APL at I.P. Sharp but was implementing
         | "object-oriented languages, a lot of different LISPs,
         | Prolog"[0]. Next was the more APL-like A around 1985 and K only
         | in 1992.
         | 
         | [0] https://queue.acm.org/detail.cfm?id=1531242
        
       | ZeroCool2u wrote:
       | I cannot warn folks against using q/kdb+ enough. Use Polars or
       | DuckDB, get the job done, and enjoy your life.
        
         | jerjerjer wrote:
         | Eh, no need. Author states in the first two paragraphs that
         | there are 9 versions of k, each developed from scratch and
         | incompatible with each other. Anyone who develops software for
         | money should and would leave immediately. I do appreciate the
         | honesty, though.
        
         | 7thaccount wrote:
         | What has your experience been like? What are the drawbacks
         | besides the cost and proprietary nature?
        
           | ZeroCool2u wrote:
           | I don't want to be too disparaging, so I will just say that
           | the language is exotic. Otherwise, the licensing model is
           | Oracle-esque based on host and core counts etc. The software
           | is fast, that you cannot deny, though it does critically
           | depend on the speed of the storage attached to the host.
           | Also, it's written in C++ and it shows. Had to do multiple
           | (paid) upgrades due to memory leaks.
           | 
           | I'm sure there was a time it was best in class and even now
           | maybe it's the best for a few niche use cases, but unless
           | you're absolutely certain you need it, I would flee from it
           | and save your sanity.
        
             | 7thaccount wrote:
             | I thought it was written in just plain C based off old
             | Arthur Whitney stories.
             | 
             | Yeah...Oracle licensing sounds scary and having to pay to
             | fix their own memory leaks sounds frustrating.
             | 
             | Thanks for the experience.
        
         | boothby wrote:
         | > and enjoy your life.
         | 
         | As somebody who hacks on, around and in esoteric languages for
         | fun; I must object.
        
           | ZeroCool2u wrote:
           | And as someone that has written an interpreter from scratch
           | in F#, and since there's a free trial version, I'd say go for
           | it and have fun and live your best life! Just perhaps
           | reconsider allowing your livelihood to be dependent on it :)
        
       | HexDecOctBin wrote:
       | What is the difference between APL and all the various APL-like
       | languages like BQN, J and K? Which one should a beginner start
       | with? Which has the best tooling for debugging, type checking,
       | etc.?
        
         | radiator wrote:
         | I think the best today cannot be APL, because it carries so
         | much historical baggage and because commercial implementations
         | dominate it. So start with BQN, it is free, has the tooling and
         | it also has succeeded in building a community.
        
         | skruger wrote:
         | Depends. APL is the OG. Try a few and see what you like. If you
         | learn one Iverson language, it's pretty easy picking up the
         | others.
         | 
         | Here's a gentle guide to APL by the same author (me):
         | 
         | https://xpqz.github.io/learnapl/
         | 
         | Dyalog APL is likely the best supported in terms of tooling,
         | debugging etc. If you're looking for static typing, you're in
         | the wrong place.
        
         | tomku wrote:
         | There's several ways to look at the differences.
         | 
         | The one that will jump out at most programmers who are familiar
         | with mainstream languages is that J, k, q and Nial use ASCII
         | characters while APL, BQN and Uiua prefer glyphs. q and Nial
         | additionally favor words rather than shortened abbreviations,
         | and Uiua has plain words that auto-format to its glyphs to aid
         | in typing. The other glyph-based languages rely on custom
         | (software) keyboard layouts or input methods to let you type
         | the symbols they need. You do not need a special keyboard to
         | program in any of these languages. ASCII-or-not is not a
         | decision that any of the array languages have made lightly or
         | for purely aesthetic reasons, it has deep consequences for how
         | the languages feel that won't really make sense until you get
         | some hands-on experience. As a beginner you'll probably
         | gravitate towards one of the sides without understanding those
         | deeper implications, and that's totally okay, but please keep
         | an open mind.
         | 
         | If access to a high-quality open-source implementation is
         | important for you, your options narrow a bit. J, BQN, Uiua and
         | Nial all have a primary implementation that's open source. k
         | has implementations that are open-source but the official
         | versions of k that most people use "in anger" are commercial
         | products with a limited free trial, and afaik there's no mature
         | open-source versions of kdb+/q, which are kind of k's killer
         | app. There are many implementations of APL but Dyalog is the
         | clear leader and it's a closed-source commercial product with a
         | personal/non-commercial free version. I wish this was less of a
         | factor because it's so hard to get people interested in
         | languages when the best versions aren't available to them, but
         | it has gotten better in recent years.
         | 
         | Regarding tooling, you should go in with minimal expectations.
         | Some of the tooling is quite good (particularly J and Dyalog
         | APL, in my opinion) but it's heavily biased towards the
         | specific type of iterative, interactive development that nearly
         | all array programmers favor. Debuggers are sometimes present
         | but usually not a primary tool. None of the major array
         | languages have static typing. There are some array-adjacent
         | languages like Futhark and Dex that do, but they're very
         | different than the "Iversonian" array languages you asked
         | about, and are also active research projects.
         | 
         | (Edit: Also worth mentioning that package managers and build
         | systems are not common in the array world.)
         | 
         | There are many other differences that matter immensely to the
         | array community but you won't have context for as a beginner,
         | so I'm not going to go too deep into them, but if you're
         | curious, https://github.com/codereport/array-language-
         | comparisons has some comparison tables and example code written
         | in a variety of languages. code_report/Conor's Youtube channel
         | at https://www.youtube.com/@code_report/ is also an excellent
         | place to get exposure to various array languages and concepts.
         | 
         | All that said, in my opinion the easiest languages to recommend
         | to get started are BQN and J, depending on whether you want
         | glyphs or not. If you're comfortable using a closed-source tool
         | with restrictive licensing, Dyalog APL is also an excellent
         | choice. Any of the three will show you both the joys and pains
         | of array programming if you put time into learning it, and give
         | you enough context to make an informed decision about going
         | deeper or finding another array language more to your taste.
        
           | cess11 wrote:
           | J has an Android interpreter, which for me as a non-
           | professional dabbler is the killer app since it means I can
           | study and play on my handheld devices when I'm on a break
           | from work or family.
           | 
           | The documentation is pretty decent compared to the other
           | members of the Iverson gang and the libraries one can install
           | with the desktop version makes it somewhat batteries
           | included, at least it's easy to suck in a file and start
           | rendering plots.
           | 
           | Maybe BQN can compete on these things nowadays, I'm not sure.
        
             | dzaima wrote:
             | You can run BQN in termux on Android pretty well. A list of
             | libraries is available at
             | https://github.com/pellertson/awesome-bqn.
             | https://mlochbaum.github.io/BQN/ has pretty good
             | documentation.
        
         | rscho wrote:
         | The main difference separating them is the array model. APL has
         | the so-called 'nested array' model, meaning that everything is
         | an array. J has a 'flat array model' meaning scalars are
         | distinct from arrays. Both models introduce typing
         | inconsistencies preventing efficiency. BQN tried to remedy this
         | and use an efficient compiler. What sets K apart is that it
         | does not have multidimensional arrays, but just lists of 1D
         | arrays. This makes K ideal for financial work, while the others
         | are more non-financial math-oriented.
        
       | jamal-kumar wrote:
       | I always thought it sounded super cool but it just doesn't exist
       | in the problem spaces I work in. Like kdb+ was specifically
       | designed to be run on bare metal without a full OS in the way of
       | things going fast, and in quant environments where you're trying
       | to shave off nanoseconds on the computations because your
       | company's gone and invested in a dedicated fibre line to trading
       | servers.
        
         | eudhxhdhsb32 wrote:
         | That's actually not true at all. No one who cares about
         | nanoseconds is using kdb+ for a production trading system.
         | 
         | It's primarily used for trading research and surveillance, not
         | live trading. And I've never heard of anyone running it without
         | an OS.
        
           | bear8642 wrote:
           | > And I've never heard of anyone running it without an OS.
           | 
           | kOS is in development though current status is unknown.
           | 
           | (https://gist.github.com/chrispsn/da00835bb122c42f429a084df83
           | ...)
        
             | 7thaccount wrote:
             | I think that got abandoned ages ago.
        
           | WorkerBee28474 wrote:
           | > No one who cares about nanoseconds is using kdb+ for a
           | production trading system.
           | 
           | For those curious, what they're actually using is FPGAs and
           | custom silicon.
        
       | pie_flavor wrote:
       | > Readability is a property of the reader, not the language.
       | 
       | Uiua[0]'s stack model is much more annoying to work with, but I
       | really appreciate its embrace of unicode glyphs. Every other
       | derivative of APL throws those out at the first opportunity, but
       | when you have a lot of glyphs, you stop being so tempted to make
       | different arities cause the same glyph to mean wildly different
       | things, when the arity is not actually written down explicitly
       | and depends on whether the next thing to the left is a parameter
       | or another function. Once you can See The Matrix, _this_ is the
       | chief thing that still does make K and friends objectively
       | unreadable in a way they don 't have to be.
       | 
       | [0]: https://uiua.org
        
         | xg15 wrote:
         | I appreciate the idea (and Uiua's examples indeed look
         | beautiful, almost like visual programming) but I'd at least
         | like some obvious way how to _pronounce_ the code.
        
           | RodgerTheGreat wrote:
           | All of the symbols in Uiua have short english words as
           | alternate names, and the online editor allows you to type
           | them by alias.
           | 
           | K has "traditional names" for all the primitive operators
           | which appear in reference cards and which are typically used
           | when discussing code aloud with other K programmers. Q and
           | Lil, which are both K descendants, outright replace some
           | symbols with those named keywords. Named keywords can make
           | the primitives superficially easier to remember, at the cost
           | of making idiomatic patterns in the language less visually
           | apparent.
        
             | xg15 wrote:
             | Ah, that makes a lot more sense. Thanks!
        
       | blablablerg wrote:
       | "K is a general-purpose programming language that excels as a
       | tool for data wrangling, analytics and transformation."
       | 
       | How does it compare to R/tidyverse?
        
         | 7thaccount wrote:
         | It's mainly for quants where you couple the array language with
         | a time series database of all your stock quotes. Once you
         | understand the language you can do a ton of analysis with
         | extremely little code. Think of it as a mathematical SQL
         | dialect I guess.
         | 
         | In my opinion, it's very cool, but Python's ecosystem (and R's)
         | is just so much better with scientific libraries and charting
         | and all that. Kdb+ (the database) and K the language are likely
         | much faster than R for general analysis type stuff. R is also
         | free and Kdb+ is not.
        
       | poulpy123 wrote:
       | I'm somewhat convinced that there is a middle ground between
       | corporate java and languages like K
        
       | tempodox wrote:
       | IDK, I'd rather have a language that compiles to native code,
       | isn't quite as write-only as that, and doesn't cost an arm and a
       | leg, even when using a DB.
        
       | sl0thentr0py wrote:
       | i've been doing the last 3 years advent of codes in q/kdb+, it's
       | a lot of fun
       | https://github.com/sl0thentr0py/aoc/blob/main/aoc2023/3/foo....
        
       | James_K wrote:
       | > Strings are just vectors of characters
       | 
       | I hope not.
        
         | lytedev wrote:
         | Can you elaborate? Why not?
        
           | James_K wrote:
           | A character could be 1 byte long, in which case the language
           | cannot properly handle unicode; it could be 4 bytes long in
           | which care there is lot of wasted space storing text and it
           | cannot properly handle extended grapheme clusters; or a
           | character could be arbitrary length at which point strings no
           | longer have a flat representation in memory. None of these
           | are good. The exact properties of a string can really only be
           | encoded efficiently with a flat linear access data-type.
        
             | dzaima wrote:
             | 1-byte characters (i.e. what k's typically have) handle
             | ASCII just fine, for which doing
             | reversing/splitting/uppercase/lowercase/iteration/etc is
             | actually meaningful (stock symbols, stringified dates,
             | identifiers, etc).
             | 
             | And if you have to handle arbitrary language user input,
             | there's basically no operations you can/should actually do
             | anyway. Uppercasing/lowercasing? Doesn't make sense on CJK
             | languages. Reversing? Completely meaningless. Trimming to
             | the first N chars for some visual display/summary/preview?
             | Even grapheme clusters won't help avoiding a character with
             | ten thousand combining components, and you'll have to do
             | language-specific logic to not cut in the middle of a word
             | for languages where the display of a prefix of a word may
             | change depending on later letters! And forget about spaces
             | meaning anything.
             | 
             | Basically the only string ops I can think of that make
             | sense for non-ASCII generally would be splitting/joining on
             | newlines and escaping for JSON/HTML or whatever, which'll
             | work completely fine on a byte list anyway.
             | 
             | There's perhaps some middle-ground of doing things for a
             | specific set of languages, but even for such you won't care
             | about the storage format anyways, as what matters for you
             | is just whether operations you use (presumably using some
             | library; and even if you write a manual uppercase for
             | French specifically or whatever, you'd notice if you
             | implemented it wrongly) do the thing they should.
             | 
             | So a list of byte chars is just fine for anything one would
             | actually do, providing optimal access to ASCII, and not
             | actually making things worse for non-ASCII.
        
               | James_K wrote:
               | Not true at all! Extended grapheme clusters are defined
               | by Unicode for a reason and include relevant combining
               | marks following a letter[1]. The point more generally is
               | that a programming language shouldn't preferentially
               | choose one character definition over another. The
               | decision of whether to iterate by bytes, points, or
               | clusters is a significant one which the language
               | shouldn't force upon users. For many common operations,
               | bytes are a sufficient representation, but then one must
               | be precise about encoding. A list of UTF-8 bytes is very
               | easy to deal with but the bytes of a UTF-16 string are
               | highly problematic. Inserting a single byte character at
               | the start of such a string would destroy it's entire
               | content. There is no situation where "give me the
               | characters of this string" is a sufficiently precise
               | statement, so it should not be made available by
               | programming languages. Likewise, the idea of indexing a
               | string is not well defined at all. The only consistent
               | interface for accessing strings requires users to specify
               | both encoding and separation, and this can only be done
               | performantly in the general case with a linear scan.
               | 
               | [1] http://unicode.org/reports/tr29/
        
               | dzaima wrote:
               | I meant the combining mark point as a thing you _would_
               | want to cut off; a 50-char chopped-off  "summary" of a
               | thing _should not_ include a character with ten thousand
               | combining marks ever. Of course it 'd be preferred to cut
               | to cut before and not in the middle, but certainly not
               | after, which is what you'd get if taking the first 50
               | extended grapheme clusters, the 20000-byte glyph counting
               | as one. Point being, you still just want to use a library
               | that has properly thought out the question. And that
               | applies to most (all?) sane fully-Unicode-aware
               | operations.
               | 
               | Places where ASCII-only is a known expectation and there
               | are meaningful per-char operations are plenty; that's
               | what using a list of bytes provides. Indeed you'd
               | probably want to use another abstraction if you have non-
               | ASCII. And for such you could use something to do the
               | form of iteration or operation you want just fine, even
               | if the input/output is a list of byte-chars representing
               | plain UTF-8.
        
               | James_K wrote:
               | Well in that case, the way you get a 50 char summary is
               | by iterating grapheme clusters, then counting up to 50
               | points and discarding the broken cluster. It's quite
               | trivial if the language exposes an interface for
               | iterating both clusters and points, and without such an
               | interface the problem is much harder to notice. Hence why
               | the language shouldn't prefer clusters to points or
               | points to clusters. It should expose all relevant
               | representations without prejudice.
               | 
               | Even if ASCII is appropriate in some situation, this
               | should be stated within the program. Requiring people to
               | be explicit about the data they produce and consume is
               | important and useful. A user might decide that UTF-16
               | best serves their need (or be working on the Windows
               | platform) in which case code which works with strings as
               | linear sequences will be able to operate on their strings
               | without issue. Code which assumes a UTF-8 byte
               | representation will require an the entire string to be
               | allocated, converted, then reallocated and converted
               | back. Huge overhead and potential incompatibility for no
               | reason.
        
               | dzaima wrote:
               | > It's quite trivial if the language exposes an interface
               | for iterating both clusters and points, and without such
               | an interface the problem is much harder to notice
               | 
               | I assure you, 99% of people won't handle this correctly
               | even if given a cluster-based interface (if they even
               | bother using it). And this still doesn't handle the
               | question of cutting words in the middle of some languages
               | resulting in broken display of the non-cut part (or
               | languages without space-based word boundaries to cut on).
               | So the preferred thing is still to use a library.
               | 
               | I don't think anyone in k would use UTF-16 via a
               | character list of 2 chars per code unit; an integer list
               | would work much nicer for that (and most k interpreters
               | should be capable of storing such with 16-bit ints;
               | there's still some preference for using UTF-8 char lists,
               | namely, such get pretty-printed as strings); and you'd
               | have to convert on _some_ I /O probably anyway. Never
               | mind the world being basically all-in on UTF-8.
               | 
               | Even if you have a string type that's capable of being
               | backed by either UTF-8 or UTF-16, you'll still need
               | conversions between those at some points; you'd want the
               | Windows API calls to have a
               | "str.asNullTerminatedUTF16Bytes()" or whatnot (lest a
               | UTF-8-encoded string makes its way here), which you can
               | trivially have an equivalent of for a byte list. And I
               | highly doubt that overhead of conversion would matter
               | anywhere you need a UTF-16-only Windows API.
               | 
               | I doubt all of those fancy operations you'll be doing
               | will have optimized impls for all formats internally
               | either, so there's internal conversions too. If anything,
               | I'd imagine that having a unified internal representation
               | would end up better, forcing the user to push the
               | conversions to the I/O boundaries and allowing focus on
               | optimizing for a single type, instead of going back-and-
               | forth internally or wasting time on multiple impls.
        
               | mlochbaum wrote:
               | I think it's worth considering that application
               | development and GUIs really aren't K's thing. For those,
               | yes, you want to be pretty careful about the concept of a
               | "character", but (as I understand it) in K you're more
               | interested in analyzing numerical data, and string
               | handling is just for slapping together a display or
               | parsing user commands or something. So a method that lets
               | the programmer use array operations that they already
               | have to know instead of learning three different non-
               | array ways to work with strings is practical. Remember
               | potential users are already very concerned about the
               | difficulty of learning the language!
        
             | fc417fc802 wrote:
             | Python uses UTF-8. A Python string is iterable. It is
             | generally reasonable to describe any iterable as a vector
             | (at least in terms of the API). The result of such
             | iteration might not be a character in any formal sense, but
             | it's a reasonable description nonetheless.
             | 
             | I'm really not seeing the issue here.
        
       | khazhoux wrote:
       | Developers act like they forgot about K
        
       | pjmlp wrote:
       | What seems to be a pity about most array languages is that in
       | theory, they would be ideal DSL languages for SIMD and MIMD code
       | exploration, but as far as I understand from ArrayCast guests,
       | most are still interpreters at heart focusing on plain CPU
       | execution.
        
         | dzaima wrote:
         | The big problem with using array languages for lower-level SIMD
         | stuff is that that generally requires some amount of typedness,
         | but tacking on types on an array language without ending up
         | with having types be the majority of the syntax and code (or
         | taking up a ton of mental capacity if utilizing very heavy type
         | inference) would be rather non-trivial. And the operations you
         | want for lower-level ops are quite different from the higher-
         | level general-purpose ones too. (and, of course, some
         | interpreters do make good use of SIMD and/or multithreading)
         | 
         | That said, some form of array language more suited for stuff
         | like that is a somewhat common question; maybe one day someone
         | will figure it out.
         | 
         | Vanessa McHale is doing some interesting work on a typed
         | compilable array language, Apple[0].
         | 
         | [0]: https://github.com/vmchale/apple/?tab=readme-ov-
         | file#apple-a...
        
         | Pompidou wrote:
         | Maybe codfns for apl will solve this ? That's what I
         | understood.. but maybe I'm wrong.
        
       | airstrike wrote:
       | Side note, but some people have such an evident talent for
       | writing that it makes reading about _any_ topic a worthwhile
       | experience. This author, Stefan Kruger, seems to be one of them.
       | 
       | I almost wish this link was to a blog rather than to a book about
       | K, for which I only have a perennial curiosity.
       | 
       | Here's to hoping they consider writing said blog. I notice they
       | have one but it only has 3 posts, all of which are about past
       | Advent of Code puzzles.
        
         | bear8642 wrote:
         | The about section has links to other things he's
         | written/presented.
        
       | IshKebab wrote:
       | > The same baseless accusations of "unreadable", "write-only" and
       | "impossible to learn" are leveled at all Iversonian languages, k
       | included.
       | 
       | I'd be really curious to know if they really are baseless. It's
       | very very difficult to imagine that K developers can _really_
       | read a mess like this as easily as one might read Go or whatever.
       | 
       | https://github.com/KxSystems/kdb/blob/master/e/json.k
       | 
       | Has anyone tested this? Take a K program and ask a K developer to
       | explain it? Or maybe introduce a deliberate bug and see how long
       | they take to fix it compared to other languages. You could
       | normalise the results based on how long it takes them to write
       | some other code.
       | 
       | Free research project for any compsci researchers out there...
       | (though good luck finding skilled K programmers).
        
         | geocar wrote:
         | > It's very very difficult to imagine that K developers can
         | really read a mess like this as easily as one might read Go or
         | whatever.
         | 
         | Shui Luo Shi Chu .
         | 
         | > Has anyone tested this? Take a K program and ask a K
         | developer to explain it?
         | 
         | I am not sure what you're asking. Do you want me to read it to
         | you?
         | 
         | Here is me reading some other people's code:
         | 
         | https://news.ycombinator.com/item?id=8476633
         | 
         | https://news.ycombinator.com/item?id=22010223
         | 
         | Do you want me to read to you the JSON encoder (written twice)
         | and the decoder in this way?
         | 
         | > Or maybe introduce a deliberate bug and see how long they
         | take to fix it compared to other languages.
         | 
         | https://news.ycombinator.com/item?id=27209093#27223086
         | 
         | > You could normalise the results based on how long it takes
         | them to write some other code.
         | 
         | https://news.ycombinator.com/item?id=22459661#22467866
         | 
         | https://news.ycombinator.com/item?id=31361023#31364262
        
         | michaelg7x wrote:
         | It's entirely possible, have done it at few times. For example,
         | the `fby` verb[?] annoyed me one too many times, so I pulled it
         | apart to see what was going on. In contrast to json.k it's
         | quite short. I usually split each separable idea into a new
         | line and introduce a bunch of new variables to track state that
         | would otherwise be passed from right to left. Lengthy end-of-
         | line comments are my chosen way of understanding q or k when I
         | come back to anything later.
        
       | russellbeattie wrote:
       | > _" there is no single definitive k, but instead a sequence of
       | slightly incompatible versions. If you decide to stick with k,
       | you'll see mentions of k4, k5 etc."_
       | 
       | I don't know about the qualities of k itself, but I think the
       | idea of having a common practice for experimental programming
       | languages to be grouped under a single name like "E" with a
       | number is quite attractive.
       | 
       | There are lots of students, hobbyists, researchers, professional
       | devs and companies who are developing their own working
       | programming language. There are a million of them, all with their
       | own names. 99.9% of them are ignored, or criticized unfairly by
       | others expecting fully fleshed out features.
       | 
       | I can imagine a GitHub repo where you can register a new language
       | "En" (with n being a number) rather than it living in obscurity
       | on a random website. Then others can jump in and experiment with
       | the language and give it feedback, fork it, etc.
       | 
       | This isn't just for toy languages, but for big organizations like
       | Google. Instead of naming a not-fully-baked C++ successor as
       | "Carbon" and getting flak for it not being ready for real world
       | code yet, they could simply call it "E321" and the status of the
       | language would be self-explanatory.
       | 
       | Then if one of the E languages gains enough traction, it could
       | "graduate" to its own named language.
       | 
       | I also like the cred that an "official" E language could get when
       | a dev talks about it to others. Everyone would immediately know
       | it was experimental and where to see the code.
        
       | sedatk wrote:
       | "k" was used in lowercase throughout the article, including the
       | title.
        
       | nottorp wrote:
       | > As you've landed here, you've clearly somehow sought out k, and
       | you likely have an idea what it's about.
       | 
       | Author didn't expect to end up on HN then :)
        
       | shric wrote:
       | My first thought was "weird, they made a language called k even
       | though there is already a language called K". I then realized
       | it's actually talking about K.
       | 
       | Thoroughout the article it's spelled k consistently except at the
       | start of a sentence. This is weird. The language is K not k.
       | Nobody spells the C language as c.
        
         | cess11 wrote:
         | It's actually rather common to spell its as k, as well as K. I
         | think q is more common than Q.
        
       ___________________________________________________________________
       (page generated 2025-02-10 23:00 UTC)