[HN Gopher] Jd
___________________________________________________________________
Jd
Author : tosh
Score : 1015 points
Date : 2022-04-04 12:04 UTC (10 hours ago)
(HTM) web link (code.jsoftware.com)
(TXT) w3m dump (code.jsoftware.com)
| pastaking wrote:
| Serious question: Why does this have so many upvotes? As a layman
| I have never heard of J or Jd before, someone please provide some
| context?
| [deleted]
| guidoism wrote:
| J is an APL language. APL is the coolest language you've never
| heard of. It's mind blowing in the same way people talk about
| Lisp, but more so since the concepts are so alien to most
| programmers.
| upwardbound wrote:
| bla3 figured out that it's because the link text "Jd" is so
| short, and people are clicking the upvote button by mistake.
| https://news.ycombinator.com/item?id=30906989
| jpf0 wrote:
| I did some work to compare Jd to data.tables and found that it
| was more performant in some instances such as on derived columns,
| and approximately equally performant on aggregations and queries.
| Jd is currently single-threaded, whereas multiple threads are
| important on some types of queries. I tried to further compare
| with Julia DB at the same time (maybe a year ago) and found that
| was incorrectly benchmarked by the authors and far slower than
| both; that might be different now. Jd is more equivalent to
| data.tables on disk; Clickhouse is far better at being a large-
| scale database.
|
| Rules of thumb on memory usage: Python/Pandas (not memory-
| mapped): "In Pandas, the rule of thumb is needing 5x-10x the
| memory for the size of your data." R (not memory-mapped): "A
| rough rule of thumb is that your RAM should be three times the
| size of your data set." Jd: "In general, performance will be good
| if available ram is more than 2 times the space required by the
| cols typically used in a query."
|
| Re CSV reading, Jd has a fast CSV reader whereas J itself does
| not. I have written an Arrow integration to enable J to get to
| that fast CSV reader and read Parquet.
| mlochbaum wrote:
| The environment around Jd has changed a bit since it was young!
| Jsoftware[0] announced it in 2012, and this particular page has
| been effectively the same since it was created in 2017 (I suspect
| this was a page move, and the content is somewhat older). In
| these early days the column-oriented database was quickly gaining
| popularity but still obscure, which is why there's this
| "Columnar" section that goes to so much trouble to explain the
| concept. Now the idea is well known among database users and
| there are lots of other options[1].
|
| The history goes back further, because column-oriented is the
| natural way to build a database in an array language (making a
| performant row-oriented DBMS would be basically impossible). This
| is because a column can be seen as a vector where every element
| has the same type. A row groups values of different types, and
| array languages don't have anything like C structs to handle
| this. In J, Jd comes from Chris Burke's JDB proof-of-concept
| (announced[2] 2008, looks like), and the linked page mentions
| kdb+ (K) and Vstar (APL). KDB, first released in 1993, is
| somewhat famous and gets a mention on Wikipedia's history of
| column-oriented databases[3].
|
| [0] Company history: https://aplwiki.com/wiki/Jsoftware
|
| [1] https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes
|
| [2] https://code.jsoftware.com/wiki/JDB/Announcement
|
| [3] https://en.wikipedia.org/wiki/Column-oriented_DBMS#History
| moonchild wrote:
| > Vstar (APL)
|
| Vstar was based on j, not apl.
| mlochbaum wrote:
| Right, I'm getting names mixed up. The Dyalog APL is vecdb,
| but it's more recent than Jd and I don't think it's
| progressed past being a toy.
| michaelmcmillan wrote:
| This section was interesting! Somehow I've never realized that
| row oriented storage is orthogonal to how disks work...
| Jd is a columnar (column oriented) RDBMS. Most RDBMS
| systems are row oriented. Ages ago they fell into the trap of
| thinking of tables as rows (records). You can see how this
| happened. The end user wants the record that has a first name,
| last name, license, make, model, color, and date. So a row was
| the unit of information and rows were stored sequentially on
| disk. Row orientation works for small amounts of data. But think
| about what happens when there are lots of rows and the user wants
| all rows where the license starts with 123 and the color is blue
| or black. In a naive system the application has to read every
| single byte of data from the disk. There are lots of bytes and
| reading from disk is, by orders of magnitude, the slowest part of
| the performance equation. To answer this simple question all the
| data had to be read from disk. This is a performance disaster and
| that is where decades of adding bandages and kludges started.
| Jd is columnar so the data is 'fully inverted'. This means all of
| the license numbers are stored together and sequentially on disk.
| The same for all the other columns. Think about the earlier query
| for license and color. Jd gets the license numbers from disk (a
| tiny fraction of the database) and generates a boolean mask of
| rows that match. It then gets the color column from disk (another
| small fraction of the data) and generates a boolean mask of
| matches and ANDS that with the other mask. It can now directly
| read just the rows from just the columns that are required in the
| result. Only a small fraction of the data is read. In J, columns
| used in queries are likely already in memory and the query runs
| at ram speed, not the sad and slow disk speed. Both
| scenarios above are simplified, but the point is strong and
| valid. The end user thinks in records, but the work to get those
| records is best organized by columns. Row oriented
| is slavishly tied to the design ideas of filing cabinets and
| manila folders. Column oriented embraces computers.
| A table column is a mapped file.
| thethimble wrote:
| If you're interested in this thought, check out Martin
| Kleppman's book DDIA where he explains storage concepts like
| this and many more. One of the best architecture books out
| there!
| derefr wrote:
| I would not that this query behavior (sorted data columns
| bitmasked together) is further orthogonal to primary-data
| storage representation. For example, Postgres can give you this
| same behavior if you declare a multi-column GIN index across
| the columns you want to be searchable.
| hodgesrm wrote:
| > Somehow I've never realized that row oriented storage is
| orthogonal to how disks work...
|
| The section you posted is very misleading. Storage is arranged
| in blocks. The secret to database performance is how you lay
| out data in those blocks and how well your access patterns to
| the blocks match the capabilities of the device. This choice is
| the fundamental key to database performance.
|
| If your database stores shopping baskets for an eCommerce site,
| you want each basket in the smallest number of blocks, ideally
| 1. It makes inserting, updating, and reading single baskets
| very fast on most modern storage devices.
|
| If your database stores data for analytic queries, it's better
| (in general) to store each column as an array of values. That
| makes compression far better, and also makes scanning single
| columns very efficient.
|
| To say as the article does that "row oriented is slavishly tied
| to design ideas of filing cabinets and manila folders" is
| nonsense. Plus there are _many_ other choices about how to
| access data that include parallelization, alignment with
| processor caches, trading off memory vs. storage, whether you
| have a cost-base query optimizer, etc. Even within column
| stores there are big differences in performance because of
| these.
|
| (Disclaimer: I work on ClickHouse and love analytic systems.
| They are great but not for everything.)
| akersten wrote:
| Isn't that just a weirdly detailed way to say "every column is
| indexed, whether you like it or not"?
| ghshephard wrote:
| Ironically - He didn't even mention indexes in his
| description (which he admitted was simplified) - a good query
| optimizer will do wonders for not only coming up with the
| appropriate hints for the query plan, but will also
| _dynamically adjust_ those hints based on the underlying data
| patterns.
|
| The example he provided,
|
| "So a row was the unit of information and rows were stored
| sequentially on disk. Row orientation works for small amounts
| of data. But think about what happens when there are lots of
| rows and the user wants all rows where the license starts
| with 123 and the color is blue or black. In a naive system
| the application has to read every single byte of data from
| the disk."
|
| Is something no modern database would ever do. The real
| challenge is not to only read the records starting with 123,
| or having blue/black - that part is trivially handled by
| every Database engine I'm familiar with. The query challenge
| is *do you filter on license # or color first? (If there are
| 1k records starting with 123 and 5mm blue/black vehicles, the
| order is pretty critical for performance) - that's one of the
| features that distinguishes query optimizers.
|
| Columnar databases are awesome when you have columnar data to
| work with - I've seen 20-30x reductions in disk storage in
| the wild (and you can obviously create synthetic examples
| that go way north of that), but a well indexed SQL database
| backed by a solid query optimizer/planner can probably stand
| it's own with a columnar database in terms of lookup
| performance, particularly if your data is row-oriented to
| begin with.
| jve wrote:
| I know nothing about J and JSoftware, but this reads like an
| Aprils fools joke. Is it?
|
| > In a naive system the application has to read every single byte
| of data from the disk. ... To answer this simple question all the
| data had to be read from disk. This is a performance disaster and
| that is where decades of adding bandages and kludges started.
| .... Think about the earlier query for license and color. Jd gets
| the license numbers from disk (a tiny fraction of the database)
|
| Ofcourse that data has to be read from disk. Well, for simple or
| aggregate queries he may gain performance. Moreover, as other
| commenter has commented, you can organize data in columns in
| MSSQL too for aggregations: https://docs.microsoft.com/en-
| us/sql/relational-databases/in...
|
| > columns used in queries are likely already in memory and the
| query runs at ram speed, not the sad and slow disk speed. ... Jd
| performance is affected primarily by ram. Lots of ram allows lots
| of rows
|
| Any other RDBMS can have sensible indexes that satisfy your
| queries. And, surprise, your data also lives in RAM once you read
| it.
|
| > You can backup a database or a table with standard host shell
| scripts, file copy commands, and tools such as tar/gzip/zip....
| If you understand backing up file folders, then you pretty much
| understand backing up Jd databases.
|
| And... throw data consistency out of the window?
|
| I'm reading and I'm "not getting" the selling point - why is this
| better?
|
| Okay, I read that things are files. SQLite is also a file if
| physical format is a concern.
| OskarS wrote:
| > Ofcourse that data has to be read from disk. Well, for simple
| or aggregate queries he may gain performance.
|
| Lets say you want to access 2 columns out of 100 in a
| particular table. In a row-oriented database, you have to read
| the full rows off the disk, which means that you have to read
| 98 pieces of data off the disk that you have no use for, a
| total waste of I/O. In a columnar database, you don't have to
| do that, you just read off the relevant columns. This is VERY
| similar to the "array of structs"/"struct of arrays" argument
| in gamedev (and related high performance fields), it's the same
| kind of tradeoff: slightly more complicated data layout traded
| in for much more efficient reads.
|
| In addition: if you have a columnar database, you can employ
| compression in a much more efficient manner. If you have 10
| million rows with the same (or very similar) data in a column,
| you can compress that to a fraction of the size. This messes
| with indexes, but it's often worth it because it VASTLY speeds
| up aggregate calculations.
|
| Row-based and column-based databases have different tradeoffs
| and advantages, and it's not quite as clear-cut as the article
| makes it seem. But it's certainly no April fools joke: columnar
| databases (for many tasks, particularly aggregates) can vastly
| outperform row-oriented databases. This is why Google BigQuery
| is columnar, for instance. Another good example is kdb+ (which
| this is clearly based off of), which is widely used in places
| which value quick time-series aggregates (Wall Street, being
| the obvious example).
|
| The article is a bit over the top and one-sided, but it doesn't
| say anything that is particularly controversial. You might
| wanna read up on columnar database systems:
| https://en.wikipedia.org/wiki/Column-oriented_DBMS
| brianwawok wrote:
| > slightly more complicated data layout traded in for much
| more efficient reads.
|
| Depending on read patterns. The classic example is address.
| Sure, you can store an address as column. Name here, city
| there, street 1 there, street 2 there. How useful is 1/5th of
| an address, and how often are you pulling it like that? For
| something like address that you generally read all or none,
| you generally are better served by a row oriented database.
|
| You also have FKs to kind of do this in a row oriented
| database. If some part of the data is not read nearly as much
| as another, it can be a foreign key sitting in another table.
| OskarS wrote:
| Yeah, exactly: there are tradeoffs to both models, neither
| is strictly superior. You would never want to do aggregates
| on addresses anyway, so that advantage is out the door. You
| do, however, want to very easily index a table of
| addresses, so you could quickly look them up for a
| particular user, which a columnar database is (arguably)
| worse at. BigQuery, in particular, does not use indexes at
| all.
|
| (EDIT: I guess you do might want to do aggregates on
| addresses, actually. "How many customers do we have in
| NYC?", that kinda thing.)
| vidarh wrote:
| Hybrids are straightforward enough. A "simple" way of
| achieving that is to support using the indexes to
| directly answer queries, as quite a few databases do. Now
| an index on a single column is _also_ a columnar store of
| the contents of that column, yet you still have the full
| row to query if you need lots of data from individual
| rows. A more sophisticated option would be to reduce
| duplication of column data.
|
| (EDIT: How well a usually row-oriented database optimises
| this, is another question, and will differ by database)
| michelpp wrote:
| > I know nothing about J and JSoftware, but this reads like an
| Aprils fools joke. Is it?
|
| To me it reads as being colored by a very specific tool bias.
| For example:
|
| >> The key difference between Jd and most other database
| systems is that Jd comes with a fully integrated and mature
| programming language.
|
| Most major database systems come with a fully integrated and
| mature programming language.
|
| >> Row orientation works for small amounts of data.
|
| It works for a _different data access pattern_. Row vs column
| is a tradeoff spectrum. Data size is just one dimension of the
| analysis.
|
| >> Row oriented is slavishly tied to the design ideas of filing
| cabinets and manila folders. Column oriented embraces
| computers.
|
| Pretty hyperbolic.
| moonchild wrote:
| > Most major database systems come with a fully integrated
| and mature programming language.
|
| Like ... pl/[pg]sql? Not exactly a joy to write.
| nostoc wrote:
| Still is a fully integrated and mature programming language
| though.
|
| And I do believe you wouldn't have any issue finding people
| who think the same of J.
| moonchild wrote:
| How integrated? Most of jd is _written_ in j. It is also
| expected that the app performing--or at least handling--
| the queries be written in j.
|
| And regarding maturity--j has libraries, debugger, etc.
| kokizzu2 wrote:
| so any benchmark against clickhouse?
| simonpure wrote:
| There's a wonderful podcast about array languages -
|
| https://www.arraycast.com/
|
| Lots of great stories about software engineering besides talking
| about the different dialects of array languages.
| stefan_ wrote:
| An example J file because this link doesn't say much:
|
| https://github.com/jsoftware/data_jd/blob/master/csv/csv.ijs
| diarrhea wrote:
| s=. 0 2}.each _3 _1{<;._2 (;i{ccfiles),'/'
|
| That is... not pretty.
| bryanrasmussen wrote:
| I would say it is not knowledge leaking, most languages leak
| knowledge so that if you are not familiar with the language
| but you do know some other programming languages you can sort
| of figure out what they do.
|
| But some languages do not leak knowledge in this way.
|
| There is the concept of beauty in programming languages that
| the expression of an idea should be succinct. This J code
| might be beautiful, but unsure.
| 0des wrote:
| You're going too meta. Does it make you happy to write it?
| Does it fulfill its purpose? If yes, don't worry how it
| looks, it is fine.
| vidarh wrote:
| To some of us the "does it make you happy" and how it
| looks are intrinsically linked.
|
| One of the things that makes me happy is to write
| beautiful code.
| jollybean wrote:
| It's meant to be efficient, not pretty.
|
| It's catching on within the AI community because the syntax
| matches well to the kinds of matrix operations common in that
| field.
|
| I think Nvidia's next chip is going to have a compiler for
| jlang.
|
| Also, I think they are starting to use it as an 'entry level'
| language for kids, you know, like grade school.
| razetime wrote:
| J isn't really made to be pretty. It's made to be terse and
| simple to read once given enough learning effort, and it's
| made to be a consistent keyboard typable notation.
| forgotpwd16 wrote:
| Tbh that file says even less. It's like in discussion about
| pandas giving a link to read_csv.py in pandas source.
| jenny91 wrote:
| It tells me everything I need to know about this language!
| mlochbaum wrote:
| Here's one of the more central files that ties into how a Jd
| database is laid out:
|
| https://github.com/jsoftware/data_jd/blob/master/base/common...
|
| Not that I claim anyone in particular can read it of course. Jd
| uses a hierarchy of folder, database, table, column that's
| handled with an object system to share code between them. A
| folder is just a place to put databases and hardly needs to add
| anything, while the other levels have a lot of extra
| functionality. As an inverted database, Jd stores each column
| in a file, and accesses it using memory mapping.
|
| https://github.com/jsoftware/data_jd/blob/master/base/folder...
|
| https://github.com/jsoftware/data_jd/blob/master/base/table....
|
| (I designed this system when I did some of the early work to
| turn JDB into Jd as a summer intern)
| hnrj95 wrote:
| are there benchmarks against kdb+ and/or shakti?
| bla3 wrote:
| Meta comment: I tried to click the link but since the title is so
| short and my mousing not very precise, I accidentally clicked the
| upvote arrow. I then clicked "unvote" and tried again, but the
| same thing happened. The third time round, I managed to click the
| link.
|
| Takeaway: Very short titles might get you some upvotes from
| clumsy users :)
| raphaelj wrote:
| The same happened with me...
| version_five wrote:
| Yes! I came here to post the same thing - the link didn't work
| for me, and I inadvertently upvoted as well. I assume the
| article has merits of it's own, but I do notice a huge ratio of
| votes to comments, I guess some are inadvertent. It would be
| interesting to search through other very short titles and look
| at the ratio of comments to votes vs others above some vote
| threshold... (I'm on my phone or I'd try) - edit, is there a
| regex search for HN anywhere? It looks like algolia doesn't
| support them
| rkalla wrote:
| lol same and would have never thought to comment on it because
| obviously user-error, then I saw this and thought twice.
| legalcorrection wrote:
| I'm intrigued but skeptical of this bit:
|
| _Jd is a columnar (column oriented) RDBMS.
|
| Most RDBMS systems are row oriented. Ages ago they fell into the
| trap of thinking of tables as rows (records). You can see how
| this happened. The end user wants the record that has a first
| name, last name, license, make, model, color, and date. So a row
| was the unit of information and rows were stored sequentially on
| disk. Row orientation works for small amounts of data. But think
| about what happens when there are lots of rows and the user wants
| all rows where the license starts with 123 and the color is blue
| or black. In a naive system the application has to read every
| single byte of data from the disk. There are lots of bytes and
| reading from disk is, by orders of magnitude, the slowest part of
| the performance equation. To answer this simple question all the
| data had to be read from disk. This is a performance disaster and
| that is where decades of adding bandages and kludges started.
|
| Jd is columnar so the data is 'fully inverted'. This means all of
| the license numbers are stored together and sequentially on disk.
| The same for all the other columns. Think about the earlier query
| for license and color. Jd gets the license numbers from disk (a
| tiny fraction of the database) and generates a boolean mask of
| rows that match. It then gets the color column from disk (another
| small fraction of the data) and generates a boolean mask of
| matches and ANDS that with the other mask. It can now directly
| read just the rows from just the columns that are required in the
| result. Only a small fraction of the data is read. In J, columns
| used in queries are likely already in memory and the query runs
| at ram speed, not the sad and slow disk speed.
|
| Both scenarios above are simplified, but the point is strong and
| valid. The end user thinks in records, but the work to get those
| records is best organized by columns.
|
| Row oriented is slavishly tied to the design ideas of filing
| cabinets and manila folders. Column oriented embraces computers.
|
| A table column is a mapped file._
|
| What's the other side of this argument?
| drc500free wrote:
| If you're using J, you are probably doing analytics and stats.
| That means you are looking for patterns in a handful of
| attributes across a large population - i.e. columnar.
|
| As others have said, row-based makes sense for most OLTP / app
| databases. You're probably not writing those products in J.
| lostgame wrote:
| This seems... _highly_ impractical for...90% of operations I 'd
| be wanting to do with a DBMS.
|
| It kinda seems like a 'different way for different's sake'
| kinda solution? :/
|
| I understand there must be a minority of operations that can
| benefit from this, but overall I can't imagine this being
| popular for most DB operations.
| vidarh wrote:
| It tend to include a large proportion of the large, expensive
| reporting queries your business people want to do. Whether or
| not those kinds of queries dominates for your system will
| depend greatly on your system.
|
| You also need to reach a certain scale before the choice
| (either way) will affect you enough to matter.
|
| But when you reach that scale it can be the difference
| between reporting queries taking seconds vs. hours in some
| cases.
|
| For some systems you'll end up wanting _both_ , and stream
| updates from the transaction focused db (row oriented) into a
| separate reporting database that uses a column store.
| cmrdporcupine wrote:
| Yeah, it's a simplification and one-sided.
|
| The general consensus as I understand it is: column-oriented
| indices/storage options are good for OLAP, large scale
| analytics, bulk data analysis. Row-oriented indices are suited
| more for OLTP, individual "record processing."
|
| Both are just techniques and there's nothing stopping a single
| db product from offering both.
| Semaphor wrote:
| > Both are just techniques and there's nothing stopping a
| single db product from offering both.
|
| e.g. for MS SQL there are columnstore indexes.
| iamwil wrote:
| Wouldn't column stores be better for the cache?
| cmrdporcupine wrote:
| I think the answer to that is just: it depends.
|
| Again comes back to usage patterns. Yes, if you're doing
| aggregation operations on a small number of columns then I
| expect locality of reference could be better with a column-
| store, rather than thrashing through row-retrievals one
| after another (and then just throwing them away after
| aggregating).
|
| But if you're frequently doing "look up this customer and
| others like them" and then using the bulk of the
| information there? I'd expect better cache behaviour out of
| row oriented storage.
|
| But these days it's so unclear what's happening inside the
| actual "black box" that is our hardware that it's hard to
| make generalizations.
| tormeh wrote:
| It all depends on access pattern. Do you tend to select entire
| rows? Use a row-oriented DB. Do you tend to select entire
| columns? A column-oriented database might be for you. That's
| it, really. None of the designs are superior, afaik.
| hoosieree wrote:
| Just to add, because J is an array-oriented language, it
| makes some kinds of column-oriented access patterns easier.
|
| For example, it's trivial to sort one array by the values of
| another array: x /: y
|
| To me, it's much easier to read than the equivalent in NumPy:
| x[np.argsort(y)]
|
| Or get pairs of (unique value; count) from an array using the
| key operator (/.): (~.;#)/.~ y
|
| Column db's make sense for array-oriented languages, because
| there's much less of a mismatch compared to OOP with
| relational.
| legalcorrection wrote:
| All of that syntax is awful. Why not just x.sortBy(y) ? Did
| all of the advances in software legibility fail to make
| their way to the modern scientific computing world?
| moonchild wrote:
| https://www.jsoftware.com/papers/tot.htm
| avmich wrote:
| Hyperbolically, because you don't write math with
| variables in camel case.
|
| J traces its roots from a notation for math, used on
| whiteboards. That awful syntax you see - it's the same as
| in some formulas in, say, general relativity, only J is
| Turing complete and not a Turing tarpit. When you work on
| a formula, in case of J you have ability to execute it,
| and if you see it's wrong you can update the formula and
| try again. This could also be done in other languages,
| but in J (I mean, APL family of languages) it's more
| focused.
|
| In defense of J, I had a professional example of a
| problem which wasn't clearly specified, which needed some
| experimentation - that took, if I remember correctly,
| some 45 minutes of attempts in J, and then the prototype
| was re-written in C#, when if was already producing
| desired outcomes. Rewriting took somewhat longer.
| moonchild wrote:
| You might be selecting entire rows, but you are probably not
| selecting _all_ of the rows, and your selection criteria
| probably do not depend on all of the columns.
| WJW wrote:
| Yeah, row-oriented is good for WHERE queries and column-
| oriented is good for SUM (or other aggregation) queries.
| [deleted]
| rileyphone wrote:
| A lot of modern, data-oriented ECS frameworks for game dev
| follow a similar philosophy, wherein components are stored in
| linear collections that optimize memory layout for caches and
| parallelism. Given how rarely you need 'SELECT *' this makes
| sense for a relational DB as well, though modern SQL DBs have a
| lot of sweat put in to their performance.
| giaour wrote:
| In many OLTP systems, almost all work operates on multiple
| attributes of a single record. E.g., when logging a user in, an
| authentication system cares about multiple attributes of a
| single user record, not facts about the aggregate pool of
| users.
|
| Column oriented stores are extremely efficient for aggregate
| queries, but they make writes and single-row reads more
| expensive and are thus not suitable for every workload. There's
| an excellent overview in Martin Kleppmann's Designing Data
| Intensive Applications.
| 0des wrote:
| > Early adopters of Jd are assumed to have a J background and
| documentation and tutorials depend on that background.
|
| All 12 of us are jumping up and down saying "it's our time,
| finally the day has come"
| [deleted]
| jdshupe wrote:
| Seeing J at the top was indeed a jump up and down moment.
| 0des wrote:
| We should have a secret handshake or some type of insignia to
| better signal to our peers. I've tried draping a J colored
| kerchief out of my back pocket but the results so far are not
| great, it appears there is more anti-J sentiment than I'd
| imagined, as I get harassed unduely in certain areas of town.
| May have to switch to maybe a hand gesture based signaling
| that can be done on the fly to signal allegiance.
| recuter wrote:
| https://www.atlasobscura.com/articles/hobo-code
| plibither8 wrote:
| 763 (and counting) votes, no. 1 on the frontpage, ...and only 83
| comments? This is one of the most skewed ratios I've seen on HN.
| upwardbound wrote:
| bla3 figured out that it's because the link text "Jd" is so
| short, and people are clicking the upvote button by mistake.
| https://news.ycombinator.com/item?id=30906989
| marcodiego wrote:
| "Jd source is largely J code and that code is open and available
| to licensed users."
|
| License?
| anonu wrote:
| It's sort of misleading because J is closed source
| 0des wrote:
| Curious how you arrived at this conclusion
| moonchild wrote:
| J is fully opensource: https://github.com/jsoftware/jsource
|
| Most of jd's source is publicly available:
| https://github.com/jsoftware/data_jd
| SparkyMcUnicorn wrote:
| Is it?
|
| https://github.com/jsoftware/jsource/blob/master/license.tx
| t
| moonchild wrote:
| > J SOURCE can be used under a commercial license from
| Jsoftware, in which case the terms and conditions of that
| license apply.
|
| > OR
|
| > J Source can be used under GNU General Public License
| version 3, in which case the terms and conditions of that
| license apply.
|
| Seems pretty clear to me.
| Shared404 wrote:
| As a side note, I really love the choice to dual license,
| and wish it were offered more often.
| jenny91 wrote:
| It's extremely common: license it under GPL/AGPL or some
| other very copyleft license; get contributors to sign a
| CLA, then offer the library with hefty license fees for
| non-FOSS projects.
| misnome wrote:
| Because it's commercially available without GPL?
| jollybean wrote:
| I wonder when Java or Swift will finally get around to adopting
| 'self effacing references'. It's 2022.
|
| [1]
| https://code.jsoftware.com/wiki/Vocabulary/SpecialCombinatio...
| anonu wrote:
| Jd has been around for a while. Buy is it production ready?
|
| I'm still looking for an open source replacement to kdb, that
| matches kdb's speed and featureset.
| kokizzu2 wrote:
| clickhouse '__')
| swasheck wrote:
| i really dislike clickhouse for anything less than
| rudimentary analysis, but appreciate that it's fast for that.
| nimrody wrote:
| Can you give some tips on what do you mean by "less than
| rudimentary analysis"? Considering adopting Clickhouse and
| wondering whether we will encounter problems down the road.
| swasheck wrote:
| the biggie for me was that analytic window functions are
| either non-existent or experimental and must be achieved
| with array function hacks.
|
| it does have nice built-in skew and kurtosis functions,
| though.
| yiyus wrote:
| Jd is not open source.
| vmchale wrote:
| Nothing matches kdb's speed except GPU-accelerated DBs:
| https://tech.marksblogg.com/benchmarks.html
| ZeroCool2u wrote:
| Yeah, KDB is... Not super fun. I've been looking at TimeScaleDB
| recently, because it's just a PostgreSQL plug-in it seems nice
| and simple, but I haven't actually compared them directly yet.
| LoriP wrote:
| If you want some intro info - and you may have found it
| already - the YouTube channel is a great place to start for
| TimescaleDB youtube.com/TimescaleDB (for tranparency: I work
| for Timescale...)
| mritchie712 wrote:
| If you were looking at pg because you need:
|
| - open source
|
| - SQL based
|
| - analytics data warehouse
|
| Then check out Clickhouse. I've been really happy with it and
| it checks all those boxes.
|
| ps - if you're interested in working with clickhouse and open
| source data tools, I'm hiring: mike@luabase.com
| nathan_compton wrote:
| I've programmed in J professionally (admittedly not for all that
| long) as a data scientist and, coincidentally, have just
| completed a small analysis using J as part of an internal
| workshop about data analysis I am planning. I typically work in R
| and Python and I have to say that at this stage there is almost
| no reason I would pick up J to do any work. Unless code-golf
| level conciseness is your only goal, these other platforms offer
| superior performance, clarity, ease of use, access to libraries
| and are, as programming languages, substantially better designed.
|
| I say this as a great lover of function-level programming and as
| a J enthusiast. I would say I am quite familiar with J's
| programming paradigm and conceptual widgets and doodads (I know
| the verbs, nouns, adverbs and conjunctions and can use them
| appropriately). I even remembered a pretty good portion of the
| Nuvoc. But doing even the simplest analysis in J was
| _excrutiatingly_ slow and inconvenient compared to using R and
| the tidyverse (in particular, I missed dplyr and ggplot). The
| tidyverse CSV readers are, for example, much faster and smarter
| and more convenient and informative than anything you'll get from
| the J universe.
|
| I love vector languages but at this point J can't compete with
| the major platforms for data analysis. Its less convenient, often
| _slower_, much more low level, strange, and its library situation
| is anemic at best. I recommend learning J because it will expand
| your mind, but I can't imagine picking it up for real work.
| moonchild wrote:
| The ecosystem problems are genuine. Though I do not think they
| are so great as you make them out to be. But with respect to
| semantics, numpy et al are but pale imitations. With respect to
| syntax, too (https://www.jsoftware.com/papers/tot.htm).
| nathan_compton wrote:
| I sort of agree with you, especially about numpy. Nothing in
| the data science space in Python feels right to me. But you
| can't beat the network effects. Its still easier to actually
| do data analysis in Python than in J.
| user3939382 wrote:
| > I do not think they are so great as you make them out to be
|
| There's a dynamic with ecosystem problems I believe applies
| to all languages. You only need one missing or bad library
| that's critical to your project to make the whole language
| useless.
|
| An anecdotal example: I remember many years ago trying to
| give Python a go and within 15 minutes ran into a problem
| parsing XML. A search revealed this was a known issue that
| was being worked on with the foremost tool in Python for this
| job. You couldn't have credibly argued that Python had an
| ecosystem problem even at the time, but for me in that
| particular scenario Python had a show-stopping ecosystem
| problem. There were ways around this, but the most convenient
| way around it at the time was switching back to a more
| familiar language.
|
| My greater point is that, we can definitely make
| generalizations about a language's ecosystem health, but keep
| in mind there is a very context-sensitive, practical
| dimension to that type of language assessment.
| moonchild wrote:
| > You only need one missing or bad library that's critical
| to your project to make the whole language useless
|
| ...no? If there is functionality I need, and no library
| implements it, I will implement it myself. That goes for
| any language. Otherwise, the job of a programmer would
| simply be to string together existing libraries, not
| writing anything meaningful.
| user3939382 wrote:
| > I will implement it myself
|
| Are you saying in the scenario described, your solution
| would have been to write an XML parser from scratch?
| moonchild wrote:
| If I need one, and I cannot find one, then yes.
| mlochbaum wrote:
| It's not ideal, but I've done this in BQN and it took
| about 15 lines. I didn't need to handle comments or
| escapes, which would add a little complexity. See
| functions ParseXml and ParseAttr here: https://github.com
| /mlochbaum/Singeli/blob/master/data/iintri...
|
| XML is particularly simple though, dealing with something
| like JPEG would be an entirely different experience.
| RexM wrote:
| Yeah.
|
| It can't be that hard.(tm)
| recuter wrote:
| The job of a programmer is to glue together existing
| libraries in the most convoluted manner possible and
| collect rent on maintenance. Perhaps even graduate to
| consulting. Grow a pointy haircut.
|
| Who the hell wants to be a programmer, dismal profession.
| VHRanger wrote:
| The fact there are vector languages in subsets of python
| (numpy, pandas, etc.) and R.
|
| And these already have great large columnar dataset support
| (eg. Apache Arrow)
|
| And an open source community intent on developing and
| maintaining the ecosystem.
| nathan_compton wrote:
| One of the nicest thing about J is the notion of verb rank.
| For non-J-programmers, you can apply a rank to a verb and
| this effects how the verb operates on its vector operands. A
| rank of zero means "operate on the entire object" whereas a
| rank of 1 means "operate on the (1) elements of the operands.
| Other ranks change the meaning of what counts as "an
| element."
|
| However, like most things in J, support for this excellent
| idea (which eliminates the need for most looping constructs
| and can be very performant) is irregular: it is limited to
| monadic and dyadic verbs. Nothing about verb rank forbids
| functions which accept more than two arguments, but the idea
| of a function which accepts more than 2 arguments is poorly
| supported in J (the idiom is to pass a boxed array to a
| monad, but the boxing of the items to be passed makes
| supporting rank behavior for the "arguments" impossible or
| absurdly complicated.
|
| Other beefs with J: J doesn't have first class functions as
| such. While you can represent functions as "nouns" in a few
| ways, you cannot have (for example) an anonymous reference to
| a function as a thing unto itself (you may denote a verb
| tacitly in a context where you need a verb, however, but this
| is not the same thing). If you want to pass around verbs in a
| way familiar to you as a contemporary programmer you have to
| use "adverbs" and "conjunctions" which are just higher order
| functions which (more or less) return verbs. But adverbs and
| conjunctions have their own peculiarities and restrictions
| (not the least of which is that they are not themselves verbs
| or nouns and thus cannot be passed around either). In
| contemporary programming languages the
| verb/adverb/conjunction space would just be represented by
| "functions" and to great effect. As a functional programmer
| and Lisp guy, I find the limitations on "verbs" very
| frustrating in J.
|
| J's error messages are also bad, never more than a few words.
|
| There are some great ideas in the language, but it feels very
| old-fashioned and out of touch.
|
| What I would like to see is a "array scheme." A lexically
| scoped Scheme-like language where every object is an array
| and function argument slots can be independently "ranked" to
| support the elimination of loops over array arguments. I'm
| too busy to put this together, but it would be great to have
| if you wanted to fiddle with arrays for some reason but could
| do without any library support for actually doing data
| analysis.
| beagle3 wrote:
| I haven't used R recently (10 years or so), but when I did, the
| speed with which K/kdb+ could scan through and summarize
| terabytes of data was orders of magnitude faster than R or any
| other system. Once the data was summarized into (say) a
| gigabyte or so, analyzing it with R or even Python was much
| easier thanks to the ecosystem and reasonable time (probably
| 10-100 times slower, but the time saved by using well tested
| stat code is more than worth it)
| 0des wrote:
| > substantially better designed
|
| Hey Siri, please remove Nathan_Compton from the Christmas card
| list.
| recuter wrote:
| Thank you for this.
|
| What do you make of BQN? https://aplwiki.com/wiki/BQN
|
| I get enamored with apl/k/j every time I see it and was looking
| for excuses to use it despite everything.
|
| I understand that due to the much smaller community the tooling
| and ecosystem is much weaker but there must be a reason why
| some people keep reaching for it, especially the guys in
| finance. I don't get the Cobol vibes from it like it is some
| sort of legacy burden. While the use case is narrow there must
| be an edge.
|
| This is HN after all. You wouldn't tell people not to mess with
| lisp and just reach for python now would you? *puppy eyes
| stare*
| all2 wrote:
| When I want to just "Get stuff done" TM, I reach for Python.
| Except that I've stopped doing that because setting up
| package versioning and venvs is a nightmare that gets more
| frustrating every time I try to do it.
|
| Now I'm looking for a "better" TM way to get my scripting
| needs met. I'm looking at Nim, specifically. I may also try
| to lean on a Scheme or a Lisp. My problem with the latter is
| lack of decent docs for getting stuff done. Maybe I'm missing
| something, but being productive in those languages for me is
| like a high jump when I can't even step up on a curb.
| jrapdx3 wrote:
| Some Scheme/Lisp implementations are capable enough to
| accomplish daily work. Common Lisp is one option, and I've
| used Chicken Scheme effectively for some projects.
|
| You're right though, there's a significant learning curve
| with any language in a different paradigm. Forth-like
| languages are an example, and yeah, J/K and cousins are
| hard to grasp. I've dabbled in these but never quite got
| there.
|
| IMO Lisp-like languages aren't quite as "foreign" since the
| syntax is a variation on 'function parameters body' used in
| "normal" (Algol-like) languages. I guess it comes down to
| what we get used to, and really for many purposes choice of
| language isn't all that critical, assuming of course it
| supports the task at hand.
| rscho wrote:
| Racket has been rated as 'an acceptable python' by a famous
| programmer. Well deserved, I think.
| beagle3 wrote:
| Nimpy makes it possible to move from Python to Nim
| gradually. It's magical, and while it doesn't solve
| python's own venv problems, it would only need the DLL from
| Python - whether it was 2.5 or 3.4 or 3.8, it would just
| work - they probably removed the python2 support by now,
| but it was just magic.
| nathan_compton wrote:
| J feels a lot like Smalltalk and Lisp to me. If you got on
| board early, you could do all sorts of stuff other languages
| struggled to make easy and performant. Hence the set of
| dedicated users. And there are some genuinely interesting
| conceptual things going on in array languages which have real
| appeal. But in the end I think J reflects a previous era and
| hasn't caught up to really useful ideas in more contemporary
| languages, probably because its user base is too
| conservative.
|
| I wouldn't recommend people use XLisp or run Genera in a VM
| to solve real problems. Recommending J feels like that to me.
| recuter wrote:
| I see your point. You dream crushing bastard. :)
|
| For no reason whatsoever here is a link to a guy building a
| Korean style wooden house by hand without using nails:
| https://www.youtube.com/watch?v=hvsvMzgiq6s
|
| What are "real problems" anyway?
|
| Sigh. You're right, I know you're right. Somehow this field
| is losing appeal over time. I'm going for a walk.
| mlochbaum wrote:
| > Somehow this field is losing appeal over time.
|
| Not true, at all! Since 2010 or so, the APL family has
| only improved its reputation and grown in popularity. I
| listed some developments of the past two years at
| https://news.ycombinator.com/item?id=28930064. Now, it's
| not much relative to the huge growth of array frameworks
| like TensorFlow with more mainstream language design, but
| it is definitely not losing appeal.
| recuter wrote:
| Oh no, Marshall, I was being far more despondent and was
| referring to programming as a whole. Thank you very much
| for your efforts on BQN.
|
| Speaking of TensorFlow, I was looking at tinygrad the
| other day: https://github.com/geohot/tinygrad/blob/master
| /tinygrad/tens...
|
| Very tempted to port it to BQN. I could be wrong but I
| bet it would shine for that. You could print the whole
| thing on a t-shirt.
| mlochbaum wrote:
| Oh, thanks for clarifying, since it occurred to me that
| you might mean just the appeal to you, but not that you
| meant the field of programming! I'm no NN expert, but
| tinygrad looks very approachable in BQN. You might be
| interested in some other initial work along those lines:
| https://github.com/loovjo/BQN-autograd with automatic
| differentiation, and the smaller
| https://github.com/bddean/BQNprop using backprop.
| hvs wrote:
| TBF, that guy isn't doing it for the fun of it (OK,
| partly for the fun of it) but because Mr. Chickadee is a
| content creator. Sure, it's a lifestyle choice, but he
| also makes his living do it. I love his channel, but his
| lifestyle is as much a product of our modern world as the
| Java programming language is.
| moonchild wrote:
| > I wouldn't recommend people use XLisp or run Genera in a
| VM to solve real problems. Recommending J feels like that
| to me.
|
| Genera and interlisp are great. I wouldn't deploy them
| because:
|
| 1) slow
|
| 2) no multithreading
|
| 3) incompatible with modern cls
|
| Point 3 is being worked on (for genera at least, and
| possibly also for interlisp). But none of these seems
| significant wrt j.
| jonahx wrote:
| >I get enamored with apl/k/j every time I see it and was
| looking for excuses to use it despite everything.
|
| You should do it. Nothing in my programming career has
| changed the way I thought so much as learning J to the point
| of real fluency. Though you could swap out APL, k, or BQN for
| the same effect.
| agumonkey wrote:
| How do you feel about the J/APL syntax in live coding sessions
| ? does it help iterating a bit faster than R/python ? or was it
| a totally irrelevant aspect ?
___________________________________________________________________
(page generated 2022-04-04 23:00 UTC)