[HN Gopher] HTTP: , FTP:, and Dict:?
       ___________________________________________________________________
        
       HTTP: , FTP:, and Dict:?
        
       Author : edent
       Score  : 317 points
       Date   : 2024-09-18 11:46 UTC (4 days ago)
        
 (HTM) web link (shkspr.mobi)
 (TXT) w3m dump (shkspr.mobi)
        
       | masswerk wrote:
       | Nowadays, on macOS, "dict://Internet" will open the Dictionary
       | app with the query "Internet". (Probably behind a security
       | prompt.) Not sure, if there's similar functionality on other
       | operating systems.
        
         | dmd wrote:
         | What do you mean by "behind a security prompt"?
        
           | promiseofbeans wrote:
           | I tried it and firefox came up with a prompt confirming I
           | wanted to open 'internet' in 'Dictionary'
        
             | WarOnPrivacy wrote:
             | > firefox came up with a prompt confirming I wanted to open
             | 'internet' in 'Dictionary'
             | 
             | My Ffx 131.0b9 wasn't so adept. It gave me:
             | The address wasn't understood              Firefox doesn't
             | know how to open this address, because one of the following
             | protocols (dict) isn't associated with any program or is
             | not allowed in this context.              You might need to
             | install other software to open this address.
        
               | wormius wrote:
               | Mine just opened the default search engine with the
               | results, maybe it's how I have the address bar
               | configured?
        
               | masswerk wrote:
               | The behavior may be also vary depending on whether it's
               | an actual link a document or direct input to the location
               | bar. (I get different prompts on Firefox, while both will
               | forward to the built-in system dictionary.)
               | 
               | And, of course, any further configurations may alter
               | this, e.g., another service may have been registered for
               | this protocol.
        
           | bobbylarrybobby wrote:
           | Browsers generally throw up a prompt before opening an
           | external app
        
       | bubblesnort wrote:
       | Never heard of the dict command?
       | 
       | The author went through the trouble of figuring out the protocol
       | but never bothered to just run dict. Okay.
        
         | jmillikin wrote:
         | What do you think the post would have contained if he had run
         | dict?
         | 
         | Here's a hint:                 macbook% dict       zsh: command
         | not found: dict       desktop$ dict       bash: dict: command
         | not found
         | 
         | You'd have to be pretty into retro-computing before you'll find
         | an OS that ships /usr/bin/dict .
        
           | politelemon wrote:
           | Not really... FWIW, addressing the 'hint' you're giving
           | specifically, this is on Ubuntu.                   $ dict
           | Command 'dict' not found, but can be installed with:
           | sudo apt install dict
           | 
           | After installing,                   $ dict example         6
           | definitions found         ...
        
             | jmillikin wrote:
             | So the causal chain would be:
             | 
             | 1. Notice a URL scheme dict://
             | 
             | 2. Try to type 'dict' into a terminal, on the off chance
             | there's a command-line tool with the same name (would you
             | do this for https:// and expect the same outcome?)
             | 
             | 3. Be running a distribution that modifies the user's shell
             | environment to suggest packages related to unknown commands
             | 
             | 4. Actually install and run that command
             | 
             | 5. Be running tcpdump or wireshark at the same time to
             | notice that the `dict` command is reaching out to the
             | network, as opposed to doing some sort of local lookup in
             | /usr/share/dict
             | 
             | 6. Figure out from the network traffic that the tool is
             | using a dictionary-specific protocol as opposed to just
             | making an HTTP request to dictionary.com or whatever.
             | 
             | --
             | 
             | Nah, the only way someone would know (or even suspect!)
             | that dict:// is somehow related to an ancient Unix command-
             | line tool is prior knowledge, and it's unreasonable to
             | expect the article author to have somehow intuited such an
             | idea.
        
               | kragen wrote:
               | right, the causal chain would be that the author already
               | used the dict command and had at some point read the man
               | page, which begins                   DICT(1)
               | DICT(1)              NAME                dict - DICT
               | Protocol Client              SYNOPSIS
               | (...)              DESCRIPTION                dict  is  a
               | client  for  the  Dictionary Server Protocol (DICT),
               | (...)
               | 
               | but, yeah, not everybody has that background
               | 
               | which is fine! nobody is born knowing all the unix
               | commands
        
               | bubblesnort wrote:
               | HTTP is the weird protocol here. Lots of protocols were
               | named after the program (or vice versa).
               | 
               | finger, ftp, ssh, talk, telnet, tftp, maybe whois too?
        
               | kragen wrote:
               | also gopher. mail and dns are two other exceptions tho
        
         | forbiddenlake wrote:
         | The point of the article wasn't how to define a word, it was
         | answering why old code mentioned the dict protocol in the same
         | regex as http and ftp.
         | 
         | I for one had never heard of the dict:// protocol, so I was
         | curious about it.
        
         | ludwik wrote:
         | This is a fun post about an obscure internet protocol, not a
         | how-to.
        
         | hnlmorg wrote:
         | The "dict" string was included in a regex of protocols. So they
         | wanted to learn more about the protocol.
         | 
         | It's entirely possible they were already aware of other
         | software that supports dictionary lookups.
        
           | kragen wrote:
           | dict isn't other software, it's a client for the protocol
           | being discussed
        
             | hnlmorg wrote:
             | You can still be aware that 'dict' client exists without
             | realising that it wasn't just another HTTP user agent.
        
               | kragen wrote:
               | true
        
       | hkt wrote:
       | I've been aware of dict for a while since I wrapped up an
       | esperanto to english dictionary for KOReader in a format KOReader
       | could understand. What I'd really have liked is a format like
       | this:
       | 
       | dict://<server/<origin language>/<definition language>/<word>
       | 
       | Still, it is pretty cool that dict servers exist at all, so no
       | complaints here.
        
       | im3w1l wrote:
       | I admire these old protocols that are intentionally built to be
       | usable both by machines and humans. Like the combination of a
       | response code, and a human readable explanation. A help command
       | right in the protocol.
       | 
       | Makes me think it's a shame that making a json based protocol is
       | so much easier to whip up than a textual ebnf-specified protocol.
       | 
       | Like imagine if in python there was some library that you gave an
       | ebnf spec maybe with some extra features similar to regex (named
       | groups?) and you could compile it to a state machine and use it
       | to parse documents, getting a Dict out.
        
         | orf wrote:
         | Unfortunately, in practice they are a nightmare. Look at the
         | WHOIS protocol for an example.
         | 
         | Humans don't look at responses very much, so you should
         | optimise for machines. If you want a human-readable view, then
         | turn the JSON response into something readable.
        
           | poincaredisk wrote:
           | Whois protocol has no grammar to speak of (the response body
           | is not defined at all, just a text blob) which makes it a
           | nightmare to parse. Having a proper response format would
           | solve this.
           | 
           | Though I agree, I prefer my responses in JSON.
        
           | astrobe_ wrote:
           | The next logical step is to use a machine-friendly format
           | instead; that is a binary protocol.
           | 
           | Even HTML and XML which were designed for readability and
           | manual writing eventually became 'not usable enough"
           | ("became" because I think part of it is that their success
           | made them exposed to less technical populations), and now we
           | have markdown everywhere which most of the times is converted
           | to HTML.
           | 
           | So if you are going to use a tool more sophisticated than
           | Ed/Edlin to read and write (rich) text in a certain format,
           | it could be more efficient to focus on making the job of the
           | machine - and of the programmer, easier.
           | 
           | If you look at a binary protocol such as NTP, the binary
           | format leaves very little room for Postel's principle [1], so
           | it is straightforward to make a program that queries a server
           | and display the result.
           | 
           | [1] https://en.wikipedia.org/wiki/Robustness_principle
        
         | kragen wrote:
         | maybe we could have a format that was more human-readable than
         | json (or especially xml) but still reliably emittable and
         | parseable? yaml, maybe, or toml, although i'm not that
         | enthusiastic about them. another proposal for such a thing was
         | ogdl (https://ogdl.org/), a notation for what i think are
         | called rose trees
         | 
         | > _OGDL: Ordered Graph Data Language_
         | 
         | > _A simple and readable data format, for humans and machines
         | alike._
         | 
         | > _OGDL is a structured textual format that represents
         | information in the form of graphs, where the nodes are strings
         | and the arcs or edges are spaces or indentation._
         | 
         | their example:                   network           eth0
         | ip   192.168.0.10             mask 255.255.255.0             gw
         | 192.168.0.1              hostname crispin
         | 
         | another possibility is jevko; https://jevko.org/ describes it
         | and http://canonical.org/~kragen/rose/ are some of my notes
         | about the possibilities of similar rose-tree data formats
        
           | hnlmorg wrote:
           | Formats like TOML are horrible for heavily nested data (even
           | XML does a better job here) and the last time I checked, TOML
           | didn't support arrays at the top level.
           | 
           | YAML is nicer than JSON to write, but I wouldn't say it's any
           | nicer to read.
           | 
           | If you want something that's less punctuation heavy, then I'd
           | prefer we go full Wirth and have something more akin to
           | Pascal.
        
             | kragen wrote:
             | arrays at the top level are probably a bad idea for
             | protocols that need to evolve in a backward-compatible way
             | 
             | what do you mean about heavily nested data? do the other
             | formats i linked do a better job there?
             | 
             | i'm not _sure_ it 's possible to come up with a data format
             | that will work well for such a wide range of use cases, but
             | it sure would be nice to have. json is pretty great in
             | terms of being able to load it into the browser, or
             | visidata, or python, or js, or whatever
        
               | hnlmorg wrote:
               | > arrays at the top level are probably a bad idea for
               | protocols that need to evolve in a backward-compatible
               | way
               | 
               | Depends on the protocol. It might be preferable to
               | version the end point. Or if it's a specific function, eg
               | list-synonyms" then having a dictionary just to reference
               | an array could be argued as unnecessary protocol bloat.
               | Particularly given the aim of this exercise is
               | readability.
               | 
               | > what do you mean about heavily nested data? do the
               | other formats i linked do a better job there?
               | 
               | I mean a tree like structure.
               | 
               | JSON and YAML are probably the best in class here. XML,
               | for all of its warts, is good at handling nested data in
               | a readable way too.
               | 
               | TOML was more based around a flatter structure.
               | 
               | > i'm not sure it's possible to come up with a data
               | format that will work well for such a wide range of use
               | cases,
               | 
               | It's not. The moment that happens, that format then
               | becomes unwieldy and people then feel the urge to invent
               | yet another new format to simplify things. It's a vicious
               | circle that happens over and over again in the tech
               | sector.
        
               | kragen wrote:
               | it's a pretty big problem that connecting existing
               | software together so often requires writing new parsers
        
               | hnlmorg wrote:
               | That suggests things are getting worse but personally
               | I've seen the opposite trend.
               | 
               | These days developers rallying around a subset of
               | established standards rather than inventing new protocols
               | and grammar for each new service.
               | 
               | Take a look at the old protocols out there: finger, DNS,
               | Gopher, HTTP, FTP, SMTP Dict, etc. they all have their
               | own grammar and in many cases, even that grammar is very
               | loosely defined or subject to dozens of different
               | standards. Whereas these days it's mostly JSON or XML
               | over HTTPS. Or ProtoBuf if you need something more
               | compact.
               | 
               | There's definitely still room for improvement. For
               | example the shift towards proprietary messaging protocols
               | like Slack, Discord, etc. But that's another topic
               | entirely.
        
               | kragen wrote:
               | yeah, i appreciate the move to html, http, and json.
               | although http/2 and http/3 arguably aren't really http,
               | and scraping data out of html is ridiculously time-
               | wasting. the shift toward cloudflare and secret criteria
               | for blocking users whose sessions act "atypical" are also
               | huge problems, but that's sort of what you'd expect from
               | using software running on remote servers you can't
               | control
        
         | donatj wrote:
         | In my department's (we were formerly our own company) internal
         | framework throwing .html on the end of any JSON response
         | outputs it in nested HTML tables. I personally find it very
         | helpful.
        
           | hnlmorg wrote:
           | At that point you might as well drop JSON altogether and use
           | an XHTML subset so your rendered output is also valid XML
           | (instead of having two different and incompatible markups
           | merged together)
        
             | zeven7 wrote:
             | I'm assuming they use the .html trick for human reading of
             | the data by developers rather than it being used in
             | production
        
               | donatj wrote:
               | That's exactly what it's for
        
               | hnlmorg wrote:
               | Ahhh that makes more sense.
        
             | theamk wrote:
             | That sounds like a bad idea. Unlike JSON, XML is (1) non-
             | trivial to parse safely, (2) hard to reliably extract
             | information and (3) verbose.
             | 
             | Leave the markup languages for intended purpose: text
             | markup. Don't force them to carry data.
        
               | hnlmorg wrote:
               | I'm not generally a fan of XML either but what you posted
               | there is just factually incorrect in just about every
               | conceivable way.
               | 
               | 1. There's plenty of XML parsers already available for
               | most languages. Yeah there have been high profile
               | exploits based from XML but given the scale of XMLs
               | usage, it's fair to say those exploits are atypical usage
               | where XML can be user supplied. And as long as you're not
               | allowing users to upload their own XML, then you get to
               | control the schema so there isn't any risks in using XML.
               | 
               | 2. XMLs entire purpose is a data store. I'm not someone
               | who likes to blame the developers for using their tools
               | wrong but honestly, if you can't unmarshal an XML schema
               | you have control over then you're not going to succeed
               | with JSON either.
               | 
               | 3. It is. But it's also highly compressible _because_ of
               | its repetitive tags. So for HTTP endpoints, it actually
               | doesn't work out any different to JSON.
               | 
               | > Leave the markup languages for intended purpose: text
               | markup. Don't force them to carry data.
               | 
               | You do realise the entire point of XML is to carry data?
               | It might have fallen out of favour in recent years but
               | those of us old enough to remember a time before JSON
               | will talk about how JSON is just a simplified
               | reimplementation of XML. And with things like JSON
               | schemas, JSON is continuing to copy XML features.
        
         | dspillett wrote:
         | _> Makes me think it 's a shame that making a json based
         | protocol is so much..._
         | 
         | Maybe I'm not the human you are thinking of, being a techie,
         | but I find a well structured JSON response, as long as it isn't
         | overly verbose and is presented in open form rather than
         | minified, to be a good compromise of human readable and easy to
         | digest programmatically.
        
           | marcosdumay wrote:
           | The legibility is probably one of the main reasons JSON got
           | adopted. XML can be made to not look too bad, but in SOAP it
           | must be unreadable, so everybody was looking into fixing
           | this.
        
             | xmlmann wrote:
             | XML has a sweet advantage though. It can be styled in the
             | browser. For example a sitemap that works for Google,
             | OpenAI, &c. and is human readable looking like a web page.
             | 
             | Example:
             | 
             | https://www.wpbeginner.com/sitemap.xml
        
               | immibis wrote:
               | This is fun, but in reality, not all that useful.
               | 
               | Except you can impress other nerds when they "view
               | source" and there's not an HTML tag in sight.
        
             | immibis wrote:
             | XML is a language designed for markup (i.e. text
             | formatting), and fitting structured documents into it
             | creates an impedance mismatch.
             | 
             | Dictionary definitions may be considered as marked-up
             | documents, so it may work. The overall structure of the
             | dictionary is not.
        
         | fouc wrote:
         | textual ebnf-specified protocol > json
        
         | somat wrote:
         | REST (REpresentational State Transfer) as a concept is very
         | human orientated. The idea was a sort of academic abstraction
         | of html. but it can be boiled down to: when you send a
         | response, also send the entire application needed to handle
         | that response. It is unfortunate that collectively we had a
         | sort of brain fart and said "ok, REST == http, got it" and lost
         | the rest of the interesting discussion about what it means to
         | send the representational state of the process.
        
         | zzo38computer wrote:
         | > I admire these old protocols ...
         | 
         | The protocols that have a response code with an explanation is
         | helpful. A help command is also helpful. So, I had written NNTP
         | server that does that, and the IRC and NNTP client software I
         | use can display them.
         | 
         | > Makes me think it's a shame that making a json based protocol
         | is so much easier to whip up ...
         | 
         | I personally don't; I find I can easily work with text-based
         | protocols if the format is easily enough.
         | 
         | I think there are problems with JSON. Some of the problems are:
         | it requires parsing escapes and keys/values, does not properly
         | support character sets other than Unicode, cannot work with
         | binary data unless it is encoded using base64 or hex or
         | something else (which makes it inefficient), etc. There are
         | other problems too.
         | 
         | > Like imagine if in python there was some library that you
         | gave an ebnf spec ...
         | 
         | Maybe it is possible to add such a library in Python, if there
         | is not already such things.
        
       | mogoh wrote:
       | hmmm                 $>curl dict://dict.org/d:Internet
       | curl: (1) Protocol "dict" not supported
        
         | fallingsquirrel wrote:
         | Works for me. I bet your OS ships a crippled version of curl.
         | $ curl --version       curl 8.7.1 (x86_64-pc-linux-gnu) [...]
         | $ curl dict://dict.org/d:Internet       220 dict.dict.org dictd
         | 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime>
         | <370202891.28105.1727009645@dict.dict.org>       250 ok
         | 150 1 definitions retrieved       [...]
        
           | soraminazuki wrote:
           | Manual build with an explicit `--disable-dict` perhaps?
           | Because it's not Debian, Fedora, Homebrew, Nix, Alpine, Arch,
           | or Gentoo, judging by their package definitions.
        
             | mogoh wrote:
             | I am on Fedora Silverblue                 $>curl --version
             | curl 8.6.0 (x86_64-redhat-linux-gnu) libcurl/8.6.0
             | OpenSSL/3.2.2 zlib/1.3.1.zlib-ng libidn2/2.3.7
             | nghttp2/1.59.0       Release-Date: 2024-01-31
             | Protocols: file ftp ftps http https ipfs ipns
             | Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy
             | IDN IPv6 Kerberos Largefile libz SPNEGO SSL threadsafe
             | UnixSockets
             | 
             | I am not sure I I understand you correctly. Should it work
             | on Fedora?
        
               | SushiHippie wrote:
               | The description on this page at least lists the dict
               | protocol
               | 
               | https://src.fedoraproject.org/rpms/curl/
               | 
               | Only the minimal build disables the dict protocol, maybe
               | you have installed the curl-minimal package?
               | 
               | https://src.fedoraproject.org/rpms/curl/blob/rawhide/f/cu
               | rl....
        
               | soraminazuki wrote:
               | Ah, it appears that curl-minimal became the default curl
               | for Fedora recently. curl-full has to be installed for
               | full functionality. I initially ignored it because I
               | assumed the default was curl-full.
               | 
               | https://fedoraproject.org/wiki/Changes/CurlMinimal_as_Def
               | aul...
               | 
               | Curl devs are predictably not too happy about this
               | change.
               | 
               | https://daniel.haxx.se/blog/2022/03/16/fedora-and-curl-
               | minim...
        
             | marxisttemp wrote:
             | You mentioned Homebrew but missed the standard macOS
             | package manager, MacPorts.
        
           | kragen wrote:
           | works for me too. but it takes about 6 seconds so curl
           | dict://localhost/d:Internet is vastly preferable
        
           | bloopernova wrote:
           | Possibly Fedora. I'm using Fedora 40 and its curl reports
           | thus:                 curl 8.6.0 (x86_64-redhat-linux-gnu)
           | libcurl/8.6.0 OpenSSL/3.2.2 zlib/1.3.1.zlib-ng libidn2/2.3.7
           | nghttp2/1.59.0       Release-Date: 2024-01-31
           | Protocols: file ftp ftps http https ipfs ipns       Features:
           | alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6
           | Kerberos Largefile libz SPNEGO SSL threadsafe UnixSockets
           | 
           | And the dict protocol is indeed unsupported by system curl.
           | 
           | EDIT: https://fedoraproject.org/wiki/Changes/CurlMinimal_as_D
           | efaul...
           | 
           | EDIT2: To change from libcurl-minimal to libcurl, run:
           | dnf swap libcurl-minimal libcurl       dnf swap curl-minimal
           | curl
           | 
           | The second step there may not be needed, at least my system
           | had curl paired with libcurl-minimal so your situation may
           | not match mine.
           | 
           | EDIT3: This is the output of my curl now:
           | curl 8.6.0 (x86_64-redhat-linux-gnu) libcurl/8.6.0
           | OpenSSL/3.2.2 zlib/1.3.1.zlib-ng brotli/1.1.0 libidn2/2.3.7
           | libpsl/0.21.5 libssh/0.10.6/openssl/zlib nghttp2/1.59.0
           | OpenLDAP/2.6.7       Release-Date: 2024-01-31
           | Protocols: dict file ftp ftps gopher gophers http https imap
           | imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb
           | smbs smtp smtps telnet tftp ws wss       Features: alt-svc
           | AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6
           | Kerberos Largefile libz NTLM PSL SPNEGO SSL threadsafe TLS-
           | SRP UnixSockets
        
       | kragen wrote:
       | dict and the relevant dictionaries are things i pretty much
       | always install on every new laptop. gcide in particular includes
       | most of the famous 1913 webster dictionary with its sparkling
       | prose:                   : ~; dict glisten         2 definitions
       | found              From The Collaborative International
       | Dictionary of English v.0.48 [gcide]:                Glisten
       | \Glis"ten\ (gl[i^]s"'n), v. i. [imp. & p. p.
       | {Glistened}; p. pr. & vb. n. {Glistening}.] [OE. glistnian,
       | akin to glisnen, glisien, AS. glisian, glisnian, akin to E.
       | glitter. See {Glitter}, v. i., and cf. {Glister}, v. i.]
       | To sparkle or shine; especially, to shine with a mild,
       | subdued, and fitful luster; to emit a soft, scintillating
       | light; to gleam; as, the glistening stars.                   Syn:
       | See {Flash}.                   [1913 Webster]
       | 
       | it's interesting to think about how you would implement this
       | service efficiently under the constraints of mid-01990s
       | computers, where a gigabyte was still a lot of disk space and
       | multiuser unix servers commonly had about 100 mips
       | (https://netlib.org/performance/html/dhrystone.data.col0.html)
       | 
       | totally by coincidence i was looking at the dictzip man page this
       | morning; it produces gzip-compatible files that support random
       | seeks so you can keep the database for your dictd server
       | compressed. (as far as i know, rik faith's dictd is still the
       | only server implementation of the dict protocol, which is
       | incidentally not a very good protocol.) you can see that the
       | penalty for seekability is about 6% in this case:
       | : ~; ls -l /usr/share/dictd/jargon.dict.dz         -rw-r--r-- 1
       | root root 587377 Jan  1  2021 /usr/share/dictd/jargon.dict.dz
       | : ~; \time gzip -dc /usr/share/dictd/jargon.dict.dz|wc -c
       | 0.01user 0.00system 0:00.01elapsed 100%CPU (0avgtext+0avgdata
       | 1624maxresident)k         0inputs+0outputs
       | (0major+160minor)pagefaults 0swaps         1418350         : ~;
       | gzip -dc /usr/share/dictd/jargon.dict.dz|gzip -9c|wc -c
       | 556102         : ~; units -t 587377/556102 %         105.62397
       | 
       | nowadays computers are fast enough that it probably isn't a big
       | win to gzip in such small chunks (dictzip has a chunk limit of
       | 64k) and you might as well use a zipfile, all implementations of
       | which support random access:                   : ~; mkdir
       | jargsplit         : ~; cd jargsplit         : jargsplit; gzip -dc
       | /usr/share/dictd/jargon.dict.dz|split -b256K         : jargsplit;
       | zip jargon.zip xaa xab xac xad xae xaf            adding: xaa
       | (deflated 60%)           adding: xab (deflated 59%)
       | adding: xac (deflated 59%)           adding: xad (deflated 61%)
       | adding: xae (deflated 62%)           adding: xaf (deflated 58%)
       | : jargsplit; ls -l jargon.zip          -rw-r--r-- 1 user user
       | 565968 Sep 22 09:47 jargon.zip         : jargsplit; time unzip -o
       | jargon.zip xad         Archive:  jargon.zip           inflating:
       | xad                                   real    0m0.011s
       | user    0m0.000s         sys     0m0.011s
       | 
       | so you see 256-kibibyte chunks have submillisecond decompression
       | time (more like 2 milliseconds on my cellphone) and only about a
       | 1.8% size penalty for seekability:                   : jargsplit;
       | units -t 565968/556102 %         101.77413
       | 
       | and, unlike the dictzip format (which lists the chunks in an
       | extra backward-combatible file header), zip also supports
       | efficient appending
       | 
       | even in python (3.11.2) it's only about a millisecond:
       | In [13]: z = zipfile.ZipFile('jargon.zip')              In [14]:
       | [f.filename for f in z.infolist()]         Out[14]: ['xaa',
       | 'xab', 'xac', 'xad', 'xae', 'xaf']              In [15]: %timeit
       | z.open('xab').read()         1.13 ms +- 16.2 us per loop (mean +-
       | std. dev. of 7 runs, 1,000 loops each)
       | 
       | this kind of performance means that any algorithm that would be
       | efficient reading data stored on a conventional spinning-rust
       | disk will be efficient reading compressed data if you put the
       | data into a zipfile in "files" of around a meg each. (writing is
       | another matter; zstd may help here, with its order-of-magnitude
       | faster compression, but info-zip zip and unzip don't support zstd
       | yet.)
       | 
       | dictd keeps an index file in tsv format which uses what looks
       | like base64 to locate the desired chunk and offset in the chunk:
       | : jargsplit; < /usr/share/dictd/jargon.index shuf -n 4 | LANG=C
       | sort | cat -vte         fossil^IB9xE^IL8$
       | frednet^IB+q5^IDD$         upload^IE/t5^IJ1$         warez
       | d00dz^IFLif^In0$
       | 
       | this is very similar to the index format used by eric raymond's
       | volks-hypertext
       | https://www.ibiblio.org/pub/Linux/apps/doctools/vh-1.8.tar.g...
       | or vi ctags or emacs etags, but it supports random access into
       | the file
       | 
       | strfile from the fortune package works on a similar principle but
       | uses a binary data file and no keys, just offsets:
       | : ~; wget -nv canonical.org/~kragen/quotes.txt         2024-09-22
       | 10:44:50 URL:http://canonical.org/~kragen/quotes.txt
       | [49884/49884] -> "quotes.txt" [1]         : ~; strfile quotes.txt
       | "quotes.txt.dat" created         There were 87 strings
       | Longest string: 1625 bytes         Shortest string: 92 bytes
       | : ~; fortune quotes.txt           Get enough beyond FUM [Fuck You
       | Money], and it's merely Nice To Have             Money.
       | -- Dave Long, <dl@silcom.com>, on FoRK, around 2000-08-16, in
       | Message-ID <200008162000.NAA10898@maltesecat>         : ~; od -i
       | --endian=big quotes.txt.dat          0000000           2
       | 87        1625          92         0000020           0
       | 620756992           0         933         0000040        1460
       | 2307        2546        3793         0000060        3887
       | 4149        5160        5471         0000100        5661
       | 6185        6616        7000
       | 
       | of course if you were using a zipfile you could keep the index in
       | the zipfile itself, and then there's no point in using base64 for
       | the file offsets, or limiting them to 32 bits
        
         | heystefan wrote:
         | So, can I somehow use the 1913 Webster dictionary on MacOS?
         | It's not in the list of configurable ones.
         | 
         | (If not possible, Terminal would work too.)
        
           | kragen wrote:
           | is gcide available? debian only offers web1913 as part of
           | gcide
        
       | commandersaki wrote:
       | I love dict/dictd but I had an issue using it in hostile networks
       | that block the port/protocol.
       | 
       | I've been tempted to revamp dict/dictd to shovel the dict
       | protocol over websokets so I can use it over the web. Just one of
       | those ideas in the pipeline that I haven't revisited because I'm
       | no longer dealing with that hostile network.
        
         | gwervc wrote:
         | The dict protocol really show it's age, notably the stateful
         | connection part. Having a new protocol based on HTTP and JSON
         | similar to LSP would be nice but there is no real interest. (I
         | made and used my own nonetheless in a research project. It may
         | even be deployed but desactivated in another one)
         | 
         | This biggest issue isn't technical, it's the fact organizations
         | having dictionary data don't want third-party to interact with
         | it without paid licensing.
        
           | steve_taylor wrote:
           | I hate the fact that corporate IT collectively decided to
           | block every port except 80 and 443, making it necessary to
           | base new protocols on HTTP instead of TCP/IP.
        
             | jolux wrote:
             | In my experience HTTP is a better foundation for novel
             | protocols in most cases.
        
               | LambdaComplex wrote:
               | Doesn't HTTP require binary data to be converted to
               | base64 encoding, thereby increasing its size on the wire?
               | That seems suboptimal for a lot of use cases
        
               | kragen wrote:
               | no, it does not, neither in requests nor in replies.
               | possibly you are thinking of smtp
        
               | devmor wrote:
               | It does not - you are perhaps thinking of GET queries:
               | URL data often must be base64 encoded as URLs are parsed
               | as characters.
               | 
               | HTTP bodies can be made up of any data in any encoding
               | you wish.
        
           | cyanydeez wrote:
           | Might be something LLM based for organization knowledge base
        
           | Towaway69 wrote:
           | There is always wiktionary, I would assume they have an api
           | of some sort. That would cover the http & json bit!
           | 
           | https://wiktionary.org
        
           | pushupentry1219 wrote:
           | > Having a new protocol based on HTTP and JSON
           | 
           | This is just a HTTP/REST api? These exist already.
        
       | praveen9920 wrote:
       | > in an age of low-size disk drives and expensive software,
       | looking up data over a dedicated protocol seems like a nifty2
       | idea. Then disk size exploded, databases became cheap, and search
       | engines made it easy to look up words.
       | 
       | I love this particular part of history about How protocols and
       | applications got build based on restrictions and got evolved
       | after improvements. Similar examples exists everywhere in
       | computer history. Projecting the same with LLMs, we will have AIs
       | running locally on mobile devices or perhaps AIs replacing OS of
       | mobile devices and router protocols and servers.
       | 
       | In future HN people looking at the code and feeling nostalgic
       | about writing code
        
         | 38 wrote:
         | Given that most current AI generated code is dogshit, I would
         | say we are well off from that.
        
       | hebocon wrote:
       | I recently began testing my own public `dictd` server. The main
       | goal was to make the OED (the full and proper one) available
       | outside of a university proxy. I figured I would add the
       | Webster's 1913 one too.
       | 
       | Unfortunately the vast majority of dictionary files are in
       | "stardict" format and the conversion to "dict" has yielded mixed
       | results. I was hoping to host _every_ dictionary, good and bad,
       | but will walk that back now. A free VPS could at least run the
       | OED.
        
         | kragen wrote:
         | what's the stardict format? which edition of the oed are you
         | hosting? i scanned the first edition decades ago but i don't
         | think there's a reasonable plain-text version of it yet
        
           | cormorant wrote:
           | StarDict (a program/file format) is easily googlable. A bit
           | of a rabbit hole is that it's been chased around hosting
           | providers because its site (used to) offer downloads of
           | copyrighted dictionaries, including the OED 2nd edition. I
           | don't know how these files were originally obtained or
           | produced. See: https://web.archive.org/web/20230718140437/htt
           | p://download.h...
           | 
           | Edit to add: Also, "i scanned the first edition decades ago"
           | sounds like quite a story. 13 volumes? What project were you
           | doing?
        
             | kragen wrote:
             | oh, i just thought it would be good for the public-domain
             | dictionary to be available to the public: https://www.mail-
             | archive.com/kragen-tol@canonical.org/msg001...
        
         | tomsmeding wrote:
         | > to make the OED (the full and proper one) available outside
         | of a university proxy.
         | 
         | Was the plan to do this in a legal fashion? If so, how?
        
       | anthk wrote:
       | echo "define * hacker " | nc dict.org 2628 | less
        
       | cratermoon wrote:
       | Oh yes, I remember dictionary servers. Also many other protocols.
       | 
       | What happened to all of those other protocols? Everything got
       | squished onto http(s) for various reasons. As mentioned in this
       | thread, corporate firewalls blocking every other port except 80
       | and 443. Around the time of the invention of http, protocols were
       | proliferating for all kinds of new ideas. Today "innovation"
       | happens on top of http, which devolves into some new kind of
       | format to push back and forth.
        
         | giantrobot wrote:
         | I wouldn't place all the blame on corporate IT for low level
         | protocols dying out. A lot of corporate IT filtering was a
         | reaction to malicious traffic originating from _inside_ their
         | networks.
         | 
         | I think filtering on university networks killed more protocols
         | than corporate filtering. Corporate networks were rarely the
         | place where someone stuck a server in the corner with a public
         | IP hosting a bunch of random services. That however was very
         | common in university networks.
         | 
         | When university networks (early 00s or so) started putting NAT
         | on ResNets and filtering faculty networks is when a lot of
         | random Internet servers started drying up. Universities had
         | huge IPv4 blocks and would hand out their addresses to every
         | machine on their networks. More than a few Web 1.0 companies
         | started life on a random Sun machine in dorm rooms or the
         | corner of a university computer lab.
         | 
         | When publicly routed IPs dried up so did random FTPs and small
         | IRC servers. At the same time residential broadband was taking
         | off but so were the sales of home routers with NAT. Hosting
         | random raw socket protocols stopped being practical for a lot
         | of people. By the time low cost VPSes became available a lot of
         | old protocols had already died out.
        
       | wormius wrote:
       | Wow, either I've forgotten this existed, or had no clue, I was
       | around for this era, and I remember Veronica, Archie, WAIS,
       | Gopher, etc, but never recall reading about a Dict protocol, nice
       | find!
        
       | nunobrito wrote:
       | Nice find, didn't knew the protocol either. The site lists all
       | available dictionaries here: https://dict.org/bin/Dict?Form=Dict4
       | 
       | I'll then be writing a java server for DICT. Likely add more
       | recent types of dictionaries and acronyms to help keeping it
       | alive.
        
       | fitsumbelay wrote:
       | _super_ fascinating and potentially useful for future projects
       | with or w /o AI. obviously makes me want to maintain my own dict
       | service love this
        
       | dokyun wrote:
       | Emacs includes a browsable client for this protocol; you can use
       | it with `M-x dictionary`.
        
       | nashashmi wrote:
       | [delayed]
        
       ___________________________________________________________________
       (page generated 2024-09-22 23:00 UTC)