[HN Gopher] HTTP: , FTP:, and Dict:?
___________________________________________________________________
HTTP: , FTP:, and Dict:?
Author : edent
Score : 317 points
Date : 2024-09-18 11:46 UTC (4 days ago)
(HTM) web link (shkspr.mobi)
(TXT) w3m dump (shkspr.mobi)
| masswerk wrote:
| Nowadays, on macOS, "dict://Internet" will open the Dictionary
| app with the query "Internet". (Probably behind a security
| prompt.) Not sure, if there's similar functionality on other
| operating systems.
| dmd wrote:
| What do you mean by "behind a security prompt"?
| promiseofbeans wrote:
| I tried it and firefox came up with a prompt confirming I
| wanted to open 'internet' in 'Dictionary'
| WarOnPrivacy wrote:
| > firefox came up with a prompt confirming I wanted to open
| 'internet' in 'Dictionary'
|
| My Ffx 131.0b9 wasn't so adept. It gave me:
| The address wasn't understood Firefox doesn't
| know how to open this address, because one of the following
| protocols (dict) isn't associated with any program or is
| not allowed in this context. You might need to
| install other software to open this address.
| wormius wrote:
| Mine just opened the default search engine with the
| results, maybe it's how I have the address bar
| configured?
| masswerk wrote:
| The behavior may be also vary depending on whether it's
| an actual link a document or direct input to the location
| bar. (I get different prompts on Firefox, while both will
| forward to the built-in system dictionary.)
|
| And, of course, any further configurations may alter
| this, e.g., another service may have been registered for
| this protocol.
| bobbylarrybobby wrote:
| Browsers generally throw up a prompt before opening an
| external app
| bubblesnort wrote:
| Never heard of the dict command?
|
| The author went through the trouble of figuring out the protocol
| but never bothered to just run dict. Okay.
| jmillikin wrote:
| What do you think the post would have contained if he had run
| dict?
|
| Here's a hint: macbook% dict zsh: command
| not found: dict desktop$ dict bash: dict: command
| not found
|
| You'd have to be pretty into retro-computing before you'll find
| an OS that ships /usr/bin/dict .
| politelemon wrote:
| Not really... FWIW, addressing the 'hint' you're giving
| specifically, this is on Ubuntu. $ dict
| Command 'dict' not found, but can be installed with:
| sudo apt install dict
|
| After installing, $ dict example 6
| definitions found ...
| jmillikin wrote:
| So the causal chain would be:
|
| 1. Notice a URL scheme dict://
|
| 2. Try to type 'dict' into a terminal, on the off chance
| there's a command-line tool with the same name (would you
| do this for https:// and expect the same outcome?)
|
| 3. Be running a distribution that modifies the user's shell
| environment to suggest packages related to unknown commands
|
| 4. Actually install and run that command
|
| 5. Be running tcpdump or wireshark at the same time to
| notice that the `dict` command is reaching out to the
| network, as opposed to doing some sort of local lookup in
| /usr/share/dict
|
| 6. Figure out from the network traffic that the tool is
| using a dictionary-specific protocol as opposed to just
| making an HTTP request to dictionary.com or whatever.
|
| --
|
| Nah, the only way someone would know (or even suspect!)
| that dict:// is somehow related to an ancient Unix command-
| line tool is prior knowledge, and it's unreasonable to
| expect the article author to have somehow intuited such an
| idea.
| kragen wrote:
| right, the causal chain would be that the author already
| used the dict command and had at some point read the man
| page, which begins DICT(1)
| DICT(1) NAME dict - DICT
| Protocol Client SYNOPSIS
| (...) DESCRIPTION dict is a
| client for the Dictionary Server Protocol (DICT),
| (...)
|
| but, yeah, not everybody has that background
|
| which is fine! nobody is born knowing all the unix
| commands
| bubblesnort wrote:
| HTTP is the weird protocol here. Lots of protocols were
| named after the program (or vice versa).
|
| finger, ftp, ssh, talk, telnet, tftp, maybe whois too?
| kragen wrote:
| also gopher. mail and dns are two other exceptions tho
| forbiddenlake wrote:
| The point of the article wasn't how to define a word, it was
| answering why old code mentioned the dict protocol in the same
| regex as http and ftp.
|
| I for one had never heard of the dict:// protocol, so I was
| curious about it.
| ludwik wrote:
| This is a fun post about an obscure internet protocol, not a
| how-to.
| hnlmorg wrote:
| The "dict" string was included in a regex of protocols. So they
| wanted to learn more about the protocol.
|
| It's entirely possible they were already aware of other
| software that supports dictionary lookups.
| kragen wrote:
| dict isn't other software, it's a client for the protocol
| being discussed
| hnlmorg wrote:
| You can still be aware that 'dict' client exists without
| realising that it wasn't just another HTTP user agent.
| kragen wrote:
| true
| hkt wrote:
| I've been aware of dict for a while since I wrapped up an
| esperanto to english dictionary for KOReader in a format KOReader
| could understand. What I'd really have liked is a format like
| this:
|
| dict://<server/<origin language>/<definition language>/<word>
|
| Still, it is pretty cool that dict servers exist at all, so no
| complaints here.
| im3w1l wrote:
| I admire these old protocols that are intentionally built to be
| usable both by machines and humans. Like the combination of a
| response code, and a human readable explanation. A help command
| right in the protocol.
|
| Makes me think it's a shame that making a json based protocol is
| so much easier to whip up than a textual ebnf-specified protocol.
|
| Like imagine if in python there was some library that you gave an
| ebnf spec maybe with some extra features similar to regex (named
| groups?) and you could compile it to a state machine and use it
| to parse documents, getting a Dict out.
| orf wrote:
| Unfortunately, in practice they are a nightmare. Look at the
| WHOIS protocol for an example.
|
| Humans don't look at responses very much, so you should
| optimise for machines. If you want a human-readable view, then
| turn the JSON response into something readable.
| poincaredisk wrote:
| Whois protocol has no grammar to speak of (the response body
| is not defined at all, just a text blob) which makes it a
| nightmare to parse. Having a proper response format would
| solve this.
|
| Though I agree, I prefer my responses in JSON.
| astrobe_ wrote:
| The next logical step is to use a machine-friendly format
| instead; that is a binary protocol.
|
| Even HTML and XML which were designed for readability and
| manual writing eventually became 'not usable enough"
| ("became" because I think part of it is that their success
| made them exposed to less technical populations), and now we
| have markdown everywhere which most of the times is converted
| to HTML.
|
| So if you are going to use a tool more sophisticated than
| Ed/Edlin to read and write (rich) text in a certain format,
| it could be more efficient to focus on making the job of the
| machine - and of the programmer, easier.
|
| If you look at a binary protocol such as NTP, the binary
| format leaves very little room for Postel's principle [1], so
| it is straightforward to make a program that queries a server
| and display the result.
|
| [1] https://en.wikipedia.org/wiki/Robustness_principle
| kragen wrote:
| maybe we could have a format that was more human-readable than
| json (or especially xml) but still reliably emittable and
| parseable? yaml, maybe, or toml, although i'm not that
| enthusiastic about them. another proposal for such a thing was
| ogdl (https://ogdl.org/), a notation for what i think are
| called rose trees
|
| > _OGDL: Ordered Graph Data Language_
|
| > _A simple and readable data format, for humans and machines
| alike._
|
| > _OGDL is a structured textual format that represents
| information in the form of graphs, where the nodes are strings
| and the arcs or edges are spaces or indentation._
|
| their example: network eth0
| ip 192.168.0.10 mask 255.255.255.0 gw
| 192.168.0.1 hostname crispin
|
| another possibility is jevko; https://jevko.org/ describes it
| and http://canonical.org/~kragen/rose/ are some of my notes
| about the possibilities of similar rose-tree data formats
| hnlmorg wrote:
| Formats like TOML are horrible for heavily nested data (even
| XML does a better job here) and the last time I checked, TOML
| didn't support arrays at the top level.
|
| YAML is nicer than JSON to write, but I wouldn't say it's any
| nicer to read.
|
| If you want something that's less punctuation heavy, then I'd
| prefer we go full Wirth and have something more akin to
| Pascal.
| kragen wrote:
| arrays at the top level are probably a bad idea for
| protocols that need to evolve in a backward-compatible way
|
| what do you mean about heavily nested data? do the other
| formats i linked do a better job there?
|
| i'm not _sure_ it 's possible to come up with a data format
| that will work well for such a wide range of use cases, but
| it sure would be nice to have. json is pretty great in
| terms of being able to load it into the browser, or
| visidata, or python, or js, or whatever
| hnlmorg wrote:
| > arrays at the top level are probably a bad idea for
| protocols that need to evolve in a backward-compatible
| way
|
| Depends on the protocol. It might be preferable to
| version the end point. Or if it's a specific function, eg
| list-synonyms" then having a dictionary just to reference
| an array could be argued as unnecessary protocol bloat.
| Particularly given the aim of this exercise is
| readability.
|
| > what do you mean about heavily nested data? do the
| other formats i linked do a better job there?
|
| I mean a tree like structure.
|
| JSON and YAML are probably the best in class here. XML,
| for all of its warts, is good at handling nested data in
| a readable way too.
|
| TOML was more based around a flatter structure.
|
| > i'm not sure it's possible to come up with a data
| format that will work well for such a wide range of use
| cases,
|
| It's not. The moment that happens, that format then
| becomes unwieldy and people then feel the urge to invent
| yet another new format to simplify things. It's a vicious
| circle that happens over and over again in the tech
| sector.
| kragen wrote:
| it's a pretty big problem that connecting existing
| software together so often requires writing new parsers
| hnlmorg wrote:
| That suggests things are getting worse but personally
| I've seen the opposite trend.
|
| These days developers rallying around a subset of
| established standards rather than inventing new protocols
| and grammar for each new service.
|
| Take a look at the old protocols out there: finger, DNS,
| Gopher, HTTP, FTP, SMTP Dict, etc. they all have their
| own grammar and in many cases, even that grammar is very
| loosely defined or subject to dozens of different
| standards. Whereas these days it's mostly JSON or XML
| over HTTPS. Or ProtoBuf if you need something more
| compact.
|
| There's definitely still room for improvement. For
| example the shift towards proprietary messaging protocols
| like Slack, Discord, etc. But that's another topic
| entirely.
| kragen wrote:
| yeah, i appreciate the move to html, http, and json.
| although http/2 and http/3 arguably aren't really http,
| and scraping data out of html is ridiculously time-
| wasting. the shift toward cloudflare and secret criteria
| for blocking users whose sessions act "atypical" are also
| huge problems, but that's sort of what you'd expect from
| using software running on remote servers you can't
| control
| donatj wrote:
| In my department's (we were formerly our own company) internal
| framework throwing .html on the end of any JSON response
| outputs it in nested HTML tables. I personally find it very
| helpful.
| hnlmorg wrote:
| At that point you might as well drop JSON altogether and use
| an XHTML subset so your rendered output is also valid XML
| (instead of having two different and incompatible markups
| merged together)
| zeven7 wrote:
| I'm assuming they use the .html trick for human reading of
| the data by developers rather than it being used in
| production
| donatj wrote:
| That's exactly what it's for
| hnlmorg wrote:
| Ahhh that makes more sense.
| theamk wrote:
| That sounds like a bad idea. Unlike JSON, XML is (1) non-
| trivial to parse safely, (2) hard to reliably extract
| information and (3) verbose.
|
| Leave the markup languages for intended purpose: text
| markup. Don't force them to carry data.
| hnlmorg wrote:
| I'm not generally a fan of XML either but what you posted
| there is just factually incorrect in just about every
| conceivable way.
|
| 1. There's plenty of XML parsers already available for
| most languages. Yeah there have been high profile
| exploits based from XML but given the scale of XMLs
| usage, it's fair to say those exploits are atypical usage
| where XML can be user supplied. And as long as you're not
| allowing users to upload their own XML, then you get to
| control the schema so there isn't any risks in using XML.
|
| 2. XMLs entire purpose is a data store. I'm not someone
| who likes to blame the developers for using their tools
| wrong but honestly, if you can't unmarshal an XML schema
| you have control over then you're not going to succeed
| with JSON either.
|
| 3. It is. But it's also highly compressible _because_ of
| its repetitive tags. So for HTTP endpoints, it actually
| doesn't work out any different to JSON.
|
| > Leave the markup languages for intended purpose: text
| markup. Don't force them to carry data.
|
| You do realise the entire point of XML is to carry data?
| It might have fallen out of favour in recent years but
| those of us old enough to remember a time before JSON
| will talk about how JSON is just a simplified
| reimplementation of XML. And with things like JSON
| schemas, JSON is continuing to copy XML features.
| dspillett wrote:
| _> Makes me think it 's a shame that making a json based
| protocol is so much..._
|
| Maybe I'm not the human you are thinking of, being a techie,
| but I find a well structured JSON response, as long as it isn't
| overly verbose and is presented in open form rather than
| minified, to be a good compromise of human readable and easy to
| digest programmatically.
| marcosdumay wrote:
| The legibility is probably one of the main reasons JSON got
| adopted. XML can be made to not look too bad, but in SOAP it
| must be unreadable, so everybody was looking into fixing
| this.
| xmlmann wrote:
| XML has a sweet advantage though. It can be styled in the
| browser. For example a sitemap that works for Google,
| OpenAI, &c. and is human readable looking like a web page.
|
| Example:
|
| https://www.wpbeginner.com/sitemap.xml
| immibis wrote:
| This is fun, but in reality, not all that useful.
|
| Except you can impress other nerds when they "view
| source" and there's not an HTML tag in sight.
| immibis wrote:
| XML is a language designed for markup (i.e. text
| formatting), and fitting structured documents into it
| creates an impedance mismatch.
|
| Dictionary definitions may be considered as marked-up
| documents, so it may work. The overall structure of the
| dictionary is not.
| fouc wrote:
| textual ebnf-specified protocol > json
| somat wrote:
| REST (REpresentational State Transfer) as a concept is very
| human orientated. The idea was a sort of academic abstraction
| of html. but it can be boiled down to: when you send a
| response, also send the entire application needed to handle
| that response. It is unfortunate that collectively we had a
| sort of brain fart and said "ok, REST == http, got it" and lost
| the rest of the interesting discussion about what it means to
| send the representational state of the process.
| zzo38computer wrote:
| > I admire these old protocols ...
|
| The protocols that have a response code with an explanation is
| helpful. A help command is also helpful. So, I had written NNTP
| server that does that, and the IRC and NNTP client software I
| use can display them.
|
| > Makes me think it's a shame that making a json based protocol
| is so much easier to whip up ...
|
| I personally don't; I find I can easily work with text-based
| protocols if the format is easily enough.
|
| I think there are problems with JSON. Some of the problems are:
| it requires parsing escapes and keys/values, does not properly
| support character sets other than Unicode, cannot work with
| binary data unless it is encoded using base64 or hex or
| something else (which makes it inefficient), etc. There are
| other problems too.
|
| > Like imagine if in python there was some library that you
| gave an ebnf spec ...
|
| Maybe it is possible to add such a library in Python, if there
| is not already such things.
| mogoh wrote:
| hmmm $>curl dict://dict.org/d:Internet
| curl: (1) Protocol "dict" not supported
| fallingsquirrel wrote:
| Works for me. I bet your OS ships a crippled version of curl.
| $ curl --version curl 8.7.1 (x86_64-pc-linux-gnu) [...]
| $ curl dict://dict.org/d:Internet 220 dict.dict.org dictd
| 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime>
| <370202891.28105.1727009645@dict.dict.org> 250 ok
| 150 1 definitions retrieved [...]
| soraminazuki wrote:
| Manual build with an explicit `--disable-dict` perhaps?
| Because it's not Debian, Fedora, Homebrew, Nix, Alpine, Arch,
| or Gentoo, judging by their package definitions.
| mogoh wrote:
| I am on Fedora Silverblue $>curl --version
| curl 8.6.0 (x86_64-redhat-linux-gnu) libcurl/8.6.0
| OpenSSL/3.2.2 zlib/1.3.1.zlib-ng libidn2/2.3.7
| nghttp2/1.59.0 Release-Date: 2024-01-31
| Protocols: file ftp ftps http https ipfs ipns
| Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy
| IDN IPv6 Kerberos Largefile libz SPNEGO SSL threadsafe
| UnixSockets
|
| I am not sure I I understand you correctly. Should it work
| on Fedora?
| SushiHippie wrote:
| The description on this page at least lists the dict
| protocol
|
| https://src.fedoraproject.org/rpms/curl/
|
| Only the minimal build disables the dict protocol, maybe
| you have installed the curl-minimal package?
|
| https://src.fedoraproject.org/rpms/curl/blob/rawhide/f/cu
| rl....
| soraminazuki wrote:
| Ah, it appears that curl-minimal became the default curl
| for Fedora recently. curl-full has to be installed for
| full functionality. I initially ignored it because I
| assumed the default was curl-full.
|
| https://fedoraproject.org/wiki/Changes/CurlMinimal_as_Def
| aul...
|
| Curl devs are predictably not too happy about this
| change.
|
| https://daniel.haxx.se/blog/2022/03/16/fedora-and-curl-
| minim...
| marxisttemp wrote:
| You mentioned Homebrew but missed the standard macOS
| package manager, MacPorts.
| kragen wrote:
| works for me too. but it takes about 6 seconds so curl
| dict://localhost/d:Internet is vastly preferable
| bloopernova wrote:
| Possibly Fedora. I'm using Fedora 40 and its curl reports
| thus: curl 8.6.0 (x86_64-redhat-linux-gnu)
| libcurl/8.6.0 OpenSSL/3.2.2 zlib/1.3.1.zlib-ng libidn2/2.3.7
| nghttp2/1.59.0 Release-Date: 2024-01-31
| Protocols: file ftp ftps http https ipfs ipns Features:
| alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6
| Kerberos Largefile libz SPNEGO SSL threadsafe UnixSockets
|
| And the dict protocol is indeed unsupported by system curl.
|
| EDIT: https://fedoraproject.org/wiki/Changes/CurlMinimal_as_D
| efaul...
|
| EDIT2: To change from libcurl-minimal to libcurl, run:
| dnf swap libcurl-minimal libcurl dnf swap curl-minimal
| curl
|
| The second step there may not be needed, at least my system
| had curl paired with libcurl-minimal so your situation may
| not match mine.
|
| EDIT3: This is the output of my curl now:
| curl 8.6.0 (x86_64-redhat-linux-gnu) libcurl/8.6.0
| OpenSSL/3.2.2 zlib/1.3.1.zlib-ng brotli/1.1.0 libidn2/2.3.7
| libpsl/0.21.5 libssh/0.10.6/openssl/zlib nghttp2/1.59.0
| OpenLDAP/2.6.7 Release-Date: 2024-01-31
| Protocols: dict file ftp ftps gopher gophers http https imap
| imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb
| smbs smtp smtps telnet tftp ws wss Features: alt-svc
| AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6
| Kerberos Largefile libz NTLM PSL SPNEGO SSL threadsafe TLS-
| SRP UnixSockets
| kragen wrote:
| dict and the relevant dictionaries are things i pretty much
| always install on every new laptop. gcide in particular includes
| most of the famous 1913 webster dictionary with its sparkling
| prose: : ~; dict glisten 2 definitions
| found From The Collaborative International
| Dictionary of English v.0.48 [gcide]: Glisten
| \Glis"ten\ (gl[i^]s"'n), v. i. [imp. & p. p.
| {Glistened}; p. pr. & vb. n. {Glistening}.] [OE. glistnian,
| akin to glisnen, glisien, AS. glisian, glisnian, akin to E.
| glitter. See {Glitter}, v. i., and cf. {Glister}, v. i.]
| To sparkle or shine; especially, to shine with a mild,
| subdued, and fitful luster; to emit a soft, scintillating
| light; to gleam; as, the glistening stars. Syn:
| See {Flash}. [1913 Webster]
|
| it's interesting to think about how you would implement this
| service efficiently under the constraints of mid-01990s
| computers, where a gigabyte was still a lot of disk space and
| multiuser unix servers commonly had about 100 mips
| (https://netlib.org/performance/html/dhrystone.data.col0.html)
|
| totally by coincidence i was looking at the dictzip man page this
| morning; it produces gzip-compatible files that support random
| seeks so you can keep the database for your dictd server
| compressed. (as far as i know, rik faith's dictd is still the
| only server implementation of the dict protocol, which is
| incidentally not a very good protocol.) you can see that the
| penalty for seekability is about 6% in this case:
| : ~; ls -l /usr/share/dictd/jargon.dict.dz -rw-r--r-- 1
| root root 587377 Jan 1 2021 /usr/share/dictd/jargon.dict.dz
| : ~; \time gzip -dc /usr/share/dictd/jargon.dict.dz|wc -c
| 0.01user 0.00system 0:00.01elapsed 100%CPU (0avgtext+0avgdata
| 1624maxresident)k 0inputs+0outputs
| (0major+160minor)pagefaults 0swaps 1418350 : ~;
| gzip -dc /usr/share/dictd/jargon.dict.dz|gzip -9c|wc -c
| 556102 : ~; units -t 587377/556102 % 105.62397
|
| nowadays computers are fast enough that it probably isn't a big
| win to gzip in such small chunks (dictzip has a chunk limit of
| 64k) and you might as well use a zipfile, all implementations of
| which support random access: : ~; mkdir
| jargsplit : ~; cd jargsplit : jargsplit; gzip -dc
| /usr/share/dictd/jargon.dict.dz|split -b256K : jargsplit;
| zip jargon.zip xaa xab xac xad xae xaf adding: xaa
| (deflated 60%) adding: xab (deflated 59%)
| adding: xac (deflated 59%) adding: xad (deflated 61%)
| adding: xae (deflated 62%) adding: xaf (deflated 58%)
| : jargsplit; ls -l jargon.zip -rw-r--r-- 1 user user
| 565968 Sep 22 09:47 jargon.zip : jargsplit; time unzip -o
| jargon.zip xad Archive: jargon.zip inflating:
| xad real 0m0.011s
| user 0m0.000s sys 0m0.011s
|
| so you see 256-kibibyte chunks have submillisecond decompression
| time (more like 2 milliseconds on my cellphone) and only about a
| 1.8% size penalty for seekability: : jargsplit;
| units -t 565968/556102 % 101.77413
|
| and, unlike the dictzip format (which lists the chunks in an
| extra backward-combatible file header), zip also supports
| efficient appending
|
| even in python (3.11.2) it's only about a millisecond:
| In [13]: z = zipfile.ZipFile('jargon.zip') In [14]:
| [f.filename for f in z.infolist()] Out[14]: ['xaa',
| 'xab', 'xac', 'xad', 'xae', 'xaf'] In [15]: %timeit
| z.open('xab').read() 1.13 ms +- 16.2 us per loop (mean +-
| std. dev. of 7 runs, 1,000 loops each)
|
| this kind of performance means that any algorithm that would be
| efficient reading data stored on a conventional spinning-rust
| disk will be efficient reading compressed data if you put the
| data into a zipfile in "files" of around a meg each. (writing is
| another matter; zstd may help here, with its order-of-magnitude
| faster compression, but info-zip zip and unzip don't support zstd
| yet.)
|
| dictd keeps an index file in tsv format which uses what looks
| like base64 to locate the desired chunk and offset in the chunk:
| : jargsplit; < /usr/share/dictd/jargon.index shuf -n 4 | LANG=C
| sort | cat -vte fossil^IB9xE^IL8$
| frednet^IB+q5^IDD$ upload^IE/t5^IJ1$ warez
| d00dz^IFLif^In0$
|
| this is very similar to the index format used by eric raymond's
| volks-hypertext
| https://www.ibiblio.org/pub/Linux/apps/doctools/vh-1.8.tar.g...
| or vi ctags or emacs etags, but it supports random access into
| the file
|
| strfile from the fortune package works on a similar principle but
| uses a binary data file and no keys, just offsets:
| : ~; wget -nv canonical.org/~kragen/quotes.txt 2024-09-22
| 10:44:50 URL:http://canonical.org/~kragen/quotes.txt
| [49884/49884] -> "quotes.txt" [1] : ~; strfile quotes.txt
| "quotes.txt.dat" created There were 87 strings
| Longest string: 1625 bytes Shortest string: 92 bytes
| : ~; fortune quotes.txt Get enough beyond FUM [Fuck You
| Money], and it's merely Nice To Have Money.
| -- Dave Long, <dl@silcom.com>, on FoRK, around 2000-08-16, in
| Message-ID <200008162000.NAA10898@maltesecat> : ~; od -i
| --endian=big quotes.txt.dat 0000000 2
| 87 1625 92 0000020 0
| 620756992 0 933 0000040 1460
| 2307 2546 3793 0000060 3887
| 4149 5160 5471 0000100 5661
| 6185 6616 7000
|
| of course if you were using a zipfile you could keep the index in
| the zipfile itself, and then there's no point in using base64 for
| the file offsets, or limiting them to 32 bits
| heystefan wrote:
| So, can I somehow use the 1913 Webster dictionary on MacOS?
| It's not in the list of configurable ones.
|
| (If not possible, Terminal would work too.)
| kragen wrote:
| is gcide available? debian only offers web1913 as part of
| gcide
| commandersaki wrote:
| I love dict/dictd but I had an issue using it in hostile networks
| that block the port/protocol.
|
| I've been tempted to revamp dict/dictd to shovel the dict
| protocol over websokets so I can use it over the web. Just one of
| those ideas in the pipeline that I haven't revisited because I'm
| no longer dealing with that hostile network.
| gwervc wrote:
| The dict protocol really show it's age, notably the stateful
| connection part. Having a new protocol based on HTTP and JSON
| similar to LSP would be nice but there is no real interest. (I
| made and used my own nonetheless in a research project. It may
| even be deployed but desactivated in another one)
|
| This biggest issue isn't technical, it's the fact organizations
| having dictionary data don't want third-party to interact with
| it without paid licensing.
| steve_taylor wrote:
| I hate the fact that corporate IT collectively decided to
| block every port except 80 and 443, making it necessary to
| base new protocols on HTTP instead of TCP/IP.
| jolux wrote:
| In my experience HTTP is a better foundation for novel
| protocols in most cases.
| LambdaComplex wrote:
| Doesn't HTTP require binary data to be converted to
| base64 encoding, thereby increasing its size on the wire?
| That seems suboptimal for a lot of use cases
| kragen wrote:
| no, it does not, neither in requests nor in replies.
| possibly you are thinking of smtp
| devmor wrote:
| It does not - you are perhaps thinking of GET queries:
| URL data often must be base64 encoded as URLs are parsed
| as characters.
|
| HTTP bodies can be made up of any data in any encoding
| you wish.
| cyanydeez wrote:
| Might be something LLM based for organization knowledge base
| Towaway69 wrote:
| There is always wiktionary, I would assume they have an api
| of some sort. That would cover the http & json bit!
|
| https://wiktionary.org
| pushupentry1219 wrote:
| > Having a new protocol based on HTTP and JSON
|
| This is just a HTTP/REST api? These exist already.
| praveen9920 wrote:
| > in an age of low-size disk drives and expensive software,
| looking up data over a dedicated protocol seems like a nifty2
| idea. Then disk size exploded, databases became cheap, and search
| engines made it easy to look up words.
|
| I love this particular part of history about How protocols and
| applications got build based on restrictions and got evolved
| after improvements. Similar examples exists everywhere in
| computer history. Projecting the same with LLMs, we will have AIs
| running locally on mobile devices or perhaps AIs replacing OS of
| mobile devices and router protocols and servers.
|
| In future HN people looking at the code and feeling nostalgic
| about writing code
| 38 wrote:
| Given that most current AI generated code is dogshit, I would
| say we are well off from that.
| hebocon wrote:
| I recently began testing my own public `dictd` server. The main
| goal was to make the OED (the full and proper one) available
| outside of a university proxy. I figured I would add the
| Webster's 1913 one too.
|
| Unfortunately the vast majority of dictionary files are in
| "stardict" format and the conversion to "dict" has yielded mixed
| results. I was hoping to host _every_ dictionary, good and bad,
| but will walk that back now. A free VPS could at least run the
| OED.
| kragen wrote:
| what's the stardict format? which edition of the oed are you
| hosting? i scanned the first edition decades ago but i don't
| think there's a reasonable plain-text version of it yet
| cormorant wrote:
| StarDict (a program/file format) is easily googlable. A bit
| of a rabbit hole is that it's been chased around hosting
| providers because its site (used to) offer downloads of
| copyrighted dictionaries, including the OED 2nd edition. I
| don't know how these files were originally obtained or
| produced. See: https://web.archive.org/web/20230718140437/htt
| p://download.h...
|
| Edit to add: Also, "i scanned the first edition decades ago"
| sounds like quite a story. 13 volumes? What project were you
| doing?
| kragen wrote:
| oh, i just thought it would be good for the public-domain
| dictionary to be available to the public: https://www.mail-
| archive.com/kragen-tol@canonical.org/msg001...
| tomsmeding wrote:
| > to make the OED (the full and proper one) available outside
| of a university proxy.
|
| Was the plan to do this in a legal fashion? If so, how?
| anthk wrote:
| echo "define * hacker " | nc dict.org 2628 | less
| cratermoon wrote:
| Oh yes, I remember dictionary servers. Also many other protocols.
|
| What happened to all of those other protocols? Everything got
| squished onto http(s) for various reasons. As mentioned in this
| thread, corporate firewalls blocking every other port except 80
| and 443. Around the time of the invention of http, protocols were
| proliferating for all kinds of new ideas. Today "innovation"
| happens on top of http, which devolves into some new kind of
| format to push back and forth.
| giantrobot wrote:
| I wouldn't place all the blame on corporate IT for low level
| protocols dying out. A lot of corporate IT filtering was a
| reaction to malicious traffic originating from _inside_ their
| networks.
|
| I think filtering on university networks killed more protocols
| than corporate filtering. Corporate networks were rarely the
| place where someone stuck a server in the corner with a public
| IP hosting a bunch of random services. That however was very
| common in university networks.
|
| When university networks (early 00s or so) started putting NAT
| on ResNets and filtering faculty networks is when a lot of
| random Internet servers started drying up. Universities had
| huge IPv4 blocks and would hand out their addresses to every
| machine on their networks. More than a few Web 1.0 companies
| started life on a random Sun machine in dorm rooms or the
| corner of a university computer lab.
|
| When publicly routed IPs dried up so did random FTPs and small
| IRC servers. At the same time residential broadband was taking
| off but so were the sales of home routers with NAT. Hosting
| random raw socket protocols stopped being practical for a lot
| of people. By the time low cost VPSes became available a lot of
| old protocols had already died out.
| wormius wrote:
| Wow, either I've forgotten this existed, or had no clue, I was
| around for this era, and I remember Veronica, Archie, WAIS,
| Gopher, etc, but never recall reading about a Dict protocol, nice
| find!
| nunobrito wrote:
| Nice find, didn't knew the protocol either. The site lists all
| available dictionaries here: https://dict.org/bin/Dict?Form=Dict4
|
| I'll then be writing a java server for DICT. Likely add more
| recent types of dictionaries and acronyms to help keeping it
| alive.
| fitsumbelay wrote:
| _super_ fascinating and potentially useful for future projects
| with or w /o AI. obviously makes me want to maintain my own dict
| service love this
| dokyun wrote:
| Emacs includes a browsable client for this protocol; you can use
| it with `M-x dictionary`.
| nashashmi wrote:
| [delayed]
___________________________________________________________________
(page generated 2024-09-22 23:00 UTC)