[HN Gopher] M2dir: Treating mails as files without going crazy
___________________________________________________________________
M2dir: Treating mails as files without going crazy
Author : cl3misch
Score : 96 points
Date : 2024-05-23 07:17 UTC (15 hours ago)
(HTM) web link (bitfehler.srht.site)
(TXT) w3m dump (bitfehler.srht.site)
| chriscappuccio wrote:
| As a CLI fan, I'm interested in where this could go
| mathfailure wrote:
| It would go only where you'd bring it to.
| ksherlock wrote:
| BeOS stored mail as individual files with extended attributes
| holding the subject, date, sender, etc. The email app (BeMail)
| was used to view/compose/send email but inbox management was
| handled by Tracker (BeOS version of Macintosh Finder or Windows
| Explorer). But the Tracker window was configured to display the
| extended attributes instead of the file names. The actual
| filename wasn't even displayed. Nobody went crazy.
|
| Example: https://birdhouse.org/beos/refugee/bemail.jpg via
| https://birdhouse.org/beos/refugee/ (which has some other images
| of Tracker organization with extended attributes)
| jauntywundrkind wrote:
| Pushing concerns up to the OS level is that one thing we could
| be doing in so many places, but havent really tried in decades.
| Should we use a universal format/protocol agnostic way of
| having data attached to files? Naaaahhhh. /s
| bobthecowboy wrote:
| My gut reaction to this was "isn't that just sqlite"?
|
| I don't think this is what you were thinking of, but I do
| kind of love the idea of formalizing sqlite file formats
| where the "metadata" is standardized and the "file" is stored
| inside. Like a file format for a recipe, or a picture, or ...
| brirec wrote:
| Isn't _that_ just a container format, like what video and
| audio files have used for decades?
|
| I don't know of any existing container formats with support
| for a relational DB as one of the embedded streams, but the
| whole point of container formats is that you _can_ add
| arbitrary metadata, which of course can be a whole
| database.
|
| Of course, the way BeOS does what OP is talking about is by
| having many DB columns within the filesystem itself! (The
| filesystem _is_ a queryable database).
| bobthecowboy wrote:
| Yes, I totally get the distinction (and I was among those
| amazed by BeOS back in the day - I still show the old
| demo videos to friends who haven't seen it). I hadn't
| considered the container formats used by media, but in my
| head it would be the other way around - each file would
| be a sqlite file _first_ so that they all share some
| commonality around access and inspection (I 'm assuming
| in my ignorance that the media container formats are
| different).
|
| Are there any database filesystems today? I haven't
| really looked, but the last one I heard of was the one
| that MS abandoned years ago. Actually I suppose Haiku
| probably still has one? I can't imagine how difficult it
| would be to get a DB Filesystem as a mainstream choice on
| Linux, let alone across OSen.
| ryandrake wrote:
| I'm generally a fan of taking advantage of the filesystem,
| especially when your application is just... storing and
| viewing files. It irrationally upsets me when an application
| grafts its own "Library" on top of my perfectly working
| filesystem, requiring me to import my files into an
| artificial thing that is just like a filesystem.
|
| On the other hand, extended attributes and other filesystem-
| specific features could be problematic if you want to share
| files with other operating systems. If I copy a file to a
| FAT32 formatted SDCard, I need to worry about what might not
| copy over.
| tracker1 wrote:
| It's a somewhat interesting idea... I've had similar ideas in the
| past regarding maildir replacement without resorting to a db
| file. I like the idea of having directories representing email
| dir/folders, you generally will want some level of aggregation
| and/or search... I've thought that having separate eml (header +
| body) along with a .meta.json file for additional tagging/details
| (deleted flag, tags, etc).
|
| Search is a very different story, you wouldn't want to have to do
| a full directory scan for text based search. So some level of
| indexing would be useful for a client mail service.
|
| Similarly, I've thought it would be really cool if Cloudflare
| offered a TCP worker option, you could to a simple mail service
| backed by R2. The web ui/ux could be pretty awesome and geo
| distributed.
| arp242 wrote:
| > Search is a very different story, you wouldn't want to have
| to do a full directory scan for text based search. So some
| level of indexing would be useful for a client mail service.
|
| I don't know; my ~/code directory has tons of stuff and
| searching with ripgrep doesn't seem too slow: %
| time rg HelloWorld | wc -l 4 rg HelloWorld >
| /dev/null 0.13s user 0.12s system 99% cpu 0.251 total
| % time rg string | wc -l 57813 rg string >
| /dev/null 0.20s user 0.14s system 99% cpu 0.339 total
|
| Rough estimate of files that rg will search: %
| scc -----------------------------------------------------
| -------------------------- Language
| Files Lines Blanks Comments Code ...
| Total 11024 1864982 175565
| 208777 1480640 ----------------------------------------
| ---------------------------------------
|
| Finding close to 60k matches in 11k files/1.7M lines in about
| 0.3 seconds isn't too bad.
|
| It should be said I ran a few commands on that directory before
| the above results, so there's probably some filesystem caching
| going on, but I can't be bothered to reboot.
|
| For many (not all, obviously) cases I think you may be able to
| get away without a index. Most people aren't subscribed to tons
| of email lists and get maybe a few emails a day at the most.
|
| I'd consider anything below ~3 seconds to be fine for search,
| so this scales to about 100k files/emails. At 10 emails/day on
| average that's about a decade. Most people do not get 10
| emails/day on average.
|
| And you can even do some "poor man indexing" by just making a
| new directory every five or ten years. Most of the time you
| want just emails from the last year or so.
| Arelius wrote:
| > Most people do not get 10 emails/day on average.
|
| I'd like to see the stats, but I seem to average around > 40
| emails a day, (most are unactionable) but always considered
| my email load quite light. For people like my wife who do
| much of their work communication over email, it appears to be
| much higher.
| tracker1 wrote:
| I'm also considering a Server/Service that has a web ui
| component, where it's shared server resources... yeah,
| running a search on a local ssd/nvme is crazy fast... now do
| it when there are 100k other users on that filesystem.
| rakoo wrote:
| > Search is a very different story, you wouldn't want to have
| to do a full directory scan for text based search. So some
| level of indexing would be useful for a client mail service.
|
| While notmuch and mu exist, I myself use the mblaze suite
| (https://github.com/leahneukirchen/mblaze) and it's more than
| enough for me. As a totally unscientific benchmark, it takes
| 300 ms to find 7 mails out of 24k when searching in headers, 4
| seconds when searching in the body.
|
| I myself use a different way: I convert the entire (all 24k of
| them) list of emails to 1-lines with Sender, Subject, Date,
| Folder and feed it to fzf which gives me preview as well. The
| search is then instant; on the given fields only, but I never
| need more than that. This is my full MUA:
| https://sr.ht/~rakoo/omail/
| mxuribe wrote:
| @tracker1 If i'm not mistaken i think thunderbird and other
| email clients who support conventional maildir often include a
| local db (such as sqlite) whose purpose tends to be mostly for
| helping indexing content to ease some aspects of search. That
| being said, as others have noted, search mostly tends to be
| fast enough at the filesystem level. ;-)
| geek_at wrote:
| not 100% related but I have build OpenTrashmail [1] which gives
| you the emails in 3 variants. As folders on disk (no DB used),
| as RSS feed or as JSON feed. Which satisfied my needs for local
| management of emails
|
| [1] https://github.com/HaschekSolutions/opentrashmail
| jll29 wrote:
| One wonders why email isn't kept in a well-thought out directory
| structure since the beginnings of UNIX, given that almost
| anything is a file in UNIX, and especially given the power of
| UNIX text processing tools.
| technofiend wrote:
| If you'd like to try it, MH uses directories and files for
| managing your email:
| https://www.gnu.org/software/emacs/manual/html_node/mh-e/ind...
| As you mentioned this is the unix way and MH (according to
| Wikipedia) dates back to 1979:
| https://en.m.wikipedia.org/wiki/MH_Message_Handling_System
| MassPikeMike wrote:
| The parent's pointer is to "MH-E", the emacs package, which
| is a great interface to MH for folks who use emacs to read
| their email.
|
| For folks who don't, I wanted to clarify that MH also works
| great outside of emacs. Its command-line tools are
| composable, so you can do things like reply to the first
| message about chess sent this week:
|
| repl `pick -subject chess -after "19 May 24 0000 PST"`
|
| Using them in scripts is especially powerful.
|
| The modern implementation is "nmh", "New Message Handler",
| https://www.nongnu.org/nmh/. MH was the mail system within
| MIT's Athena computing environment back in the day, so many
| MIT folks developed a fondness for it and it retains a
| following. There's even a very comprehensive O'Reilly book,
| free online: https://rand-mh.sourceforge.io/book/
| PurpleRamen wrote:
| There were multiple formats for storing mails through the
| times. And many are using folders. But each format has their
| own problems, and were optimized for certain benefits. And on
| unix you have to make this workable with multiple programs
| accessing them in parallel, because in the early days there
| were no servers who had tight control over everything. So,
| formats were often designed around using or preventing file
| locks, making efficient use of storage or allowing fast
| handling and management of mail-flags.
| ck45 wrote:
| For a quite long time, a very popular format was mbox (the
| most popular?), which is a single file. With the arrival of
| qmail, it was slowly replaced by Maildir.
| mbreese wrote:
| Not even just common formats, but way back, Mail was
| delivered by copying files from one server to another. I
| (barely) remember using UUCP before SMTP/NNTP to sync Mail
| and news. So, the format that you stored messages in was very
| important. It's easy to copy a single message when it is a
| complete file.
| Gys wrote:
| The mail protocol is plain text so it's not difficult to save
| emails as individual files. I had such setup some years ago for a
| company. Emails were stored in one folder per week, each email in
| its own subfolder with attachments extracted and a meta text
| file. References were in a database.
|
| I also remember working with a windows email server that saved
| all emails only as files, no db, although the directory structure
| was more complicated. But that was maybe 20 years ago...
| zokier wrote:
| Is there a reason why metadata and the message are stored so
| separately? I.e. why
| INBOX/2023-09-04_13:47_builds@sr.ht,GTfrlwJfN5vyR28R
| INBOX/.meta/GTfrlwJfN5vyR28R.flags
|
| instead of
| INBOX/2023-09-04_13:47_builds@sr.ht,GTfrlwJfN5vyR28R/message
| INBOX/2023-09-04_13:47_builds@sr.ht,GTfrlwJfN5vyR28R/.flags
|
| The latter structure would allow creating/deleting the message
| and flags atomically.
| mathfailure wrote:
| That'd require jumping between dirs when traversing multiple
| messages.
| graycat wrote:
| Been thinking about this subject:
|
| Of course, standard (usual, common) email is just text. Right for
| the pictures, to have them just as text, they are _encoded_ as
| _base64_. Right, its MIME (MultiMedia Internet Mail Extensions).
|
| Soooo, okay, my ISP (Internet Service Provider) has an email
| service. The service is a Web site, and it does offer getting the
| "Source", that is, the text, all as just one file.
|
| Now, suppose for each email message I send/receive, I keep the
| text in its own file, with just the text, just as I got it from,
| say, my ISP. I will handle the file naming, indexing,
| summarization, etc.
|
| Help!!!! Is there an _email_ program that I can run that, for
| each of those files, can read it and _display_ it? Sure, it
| should be able to display the text, as text, that is not one of
| the MIME extensions but also be able to do _the right thing_ for
| each of the rest, still images, video clips, audio, whatever.
| Know of such a program???? Thanks!
| Aloisius wrote:
| For MacOS, extracting attachments into files is useful so that
| Spotlight can index them for search. I believe the same is true
| for Windows.
|
| Mail.app, uses a directory structure that looks similar* to this
| for say, gmail: {account-uuid}/[Gmail].mbox/All
| Mail.mbox/{mailbox-guid}/Data/Messages/{msguid}.partial.emlx
| {account-uuid}/[Gmail].mbox/All Mail.mbox/{mailbox-
| guid}/Data/Attachments/{msguid}/{mime part #}/{mime subpart
| #}/filename.ext
|
| The emlx format is a bit different from eml. It contains the
| number of bytes for the message at the top and an xml plist at
| the end that has message flags, last viewed time, gmail labels,
| etc. For partial.emlx files, the base64 content is removed from
| the email itself and a content length is added.
|
| This format has its drawbacks, of course.
|
| * Not shown is the hierarchy based on message uid used to keep
| the number of files in the Messages directory down.
| QasimK wrote:
| I've been thinking about doing this myself, so it's fantastic to
| see a project.
|
| I find a files-centric (and more broadly filesystem-centric)
| approach easier to grapple with than one that focuses on apps
| (and hiding away the data). It makes it much easier to access my
| own data for other purposes outside of what the app provides. In
| particular when the files are in plain-text or otherwise human-
| editable. I can reuse all of the existing tool that I'm familiar
| with to search, modify or re-purpose the data.
| skydhash wrote:
| I can do away with files if the app provides scripting
| capabilities (IPC, plugins,...). I know the average users won't
| use it, but if you've nailed down your workflow, it's
| liberating to be able to speed up parts of it.
| robertlagrant wrote:
| I was hoping the mailing list link would be to an FTP site I'd
| upload my email to.
| colinsane wrote:
| my chief concern with the spec was actually "do FTP clients
| generally support `:` in a filename?"
|
| but then i realized i'm not likely to mount a remote M2dir so
| i'm far less concerned with the answer.
| AdieuToLogic wrote:
| Whenever I see efforts to treat email as files, I fondly think of
| my time using nmh[0]. Until the pervasive use of multimedia
| email, nmh was a really nice way to communicate with email IMHO.
|
| 0 - https://www.nongnu.org/nmh/
___________________________________________________________________
(page generated 2024-05-23 23:01 UTC)