[HN Gopher] Johnny Decimal: A System to Organize Projects (2015)
___________________________________________________________________
Johnny Decimal: A System to Organize Projects (2015)
Author : trauco
Score : 114 points
Date : 2023-09-14 09:14 UTC (13 hours ago)
(HTM) web link (johnnydecimal.com)
(TXT) w3m dump (johnnydecimal.com)
| [deleted]
| syats wrote:
| For anyone interested in how the obsession for organization can
| end: https://en.wikipedia.org/wiki/Paul_Otlet
| 0x445442 wrote:
| Or this guy... https://www.flickr.com/photos/hawkexpress/albums
| dgreisen wrote:
| It's fun seeing information science applied to different domains.
| This is very similar to the system that we've converged on over
| the last 100 years for legal codes with title.chapter.section
| (see, e.g., https://law.cityofsanmateo.org/us/ca/cities/san-
| mateo/code/8...). The ability with the system to unambiguously
| and _succinctly_ cite to a particular part of the law (or, in
| this case, the filesystem) is a game changer.
| brushfoot wrote:
| A past coworker implemented a system like this. It was awful. He
| was the gatekeeper because the numbers had to be "just so" to
| meet his approval, and he was the most senior person on the team.
|
| Limiting yourself to a handful of categories at each level makes
| sense, but numbering everything is a bad idea. It's busywork, and
| other people have to learn your idiosyncratic nomenclature. Just
| give the directories good names. Search really isn't as bad as
| the article suggests, especially with something like broot [1].
|
| [1]: https://github.com/Canop/broot
| emadda wrote:
| I didn't like the numbers to begin with, but there are a few
| advantages:
|
| - After a while you associate the number with the subject so it
| becomes faster to visually scan a list.
|
| - You can type the number faster. They are shorter to
| reference.
| bafe wrote:
| To me these intricate classification system fall in the same
| category as people developing very advanced note taking setups,
| dotfile management tools or fully customisable, reproducible
| shell/IDE/editor setups. Very nice ideas on paper, but you can
| get most of the way with far less effort if you mostly stick to
| the defaults. Trying to perfect it to death ends up looking
| like busywork and frequently causes conflicts with other team
| members less obsessed with these secondary issues
| Terretta wrote:
| This feels good, but in practice, it's friction.
|
| Instead, take advantage that ideas and work evolve through time,
| and store your work and notes chronologically. Combine that with
| good search.
|
| This allows you to find items by time proximity _and_ keyword /
| semantic / text embedding search.
|
| The only folder categorization needed are slashes around ISO 8601
| date parts (whichever groups a reasonable amount of content
| together naturally):
|
| - YYYY/MM or YYYY/WW
|
| With files named:
|
| - YYYY-MM-DD - Topic headline keywords.ext
|
| After a while, just stop filing anything, have a script sweep
| your desktop and downloads paths into this archive path if a
| piece of content hasn't been opened in x days.
|
| (If you really really want a set of things together, make a
| folder for them on your desktop, put the bits in it, and ensure
| the script only sweeps the whole folder once no bits are being
| used.)
|
| If you're still working on a set of stuff, it'll stay on your
| desktop. When you're done, it'll go where you can find it.
|
| If you need something, search. If you can't search, remember when
| you worked on it, and browse. You'll find it, and all the things
| around it, that were in your head at the time, and can "reload"
| the whole context.
| samsquire wrote:
| I think everyone has a different thought process and style of
| thinking and work and it is painful to force others to use your
| way of thinking or working or organisation.
|
| I have written about video game interfaces, for work, person
| stacks, person based APIs. Conway's law is very real.
|
| I like the idea of interlocking lists that act like gears of work
| between individuals.
|
| I can outline my interface with you and you can react to tasks I
| give you, based on your interface.
|
| Like a unit test, I can import you and tell you what I need, you
| ask clarifying questions and I reply to them and we both track
| the mutual fallout. Synchronization and back and forth with email
| is painful and antiquated.
| delta_p_delta_x wrote:
| I really like the typography of this website.
|
| Berkeley Mono for the sidebar, and IBM Plex Serif (IMO, a
| _beautiful_ serif typeface for the web), very nice.
| dredmorbius wrote:
| Some of the elements (general layout, headings, anchor styling
| and on-hover highlighting) remind me strongly of my codepen:
| <https://codepen.io/dredmorbius/full/KpMqqB>
| rmm wrote:
| I have given up. I now mostly have generic buckets but get super
| specific file names. E.g one I just saw is "Drawing of rising
| main chairing beam with solid chairs.pdf" and then use void tools
| everything to search for it
| wmal wrote:
| I'd be so funny (at least for an outsider) to see an office where
| there's a lot of corpo speak mixed with this.
|
| - we need to raise the bar for 34.18. where are the deliverables
| for 56.17?
|
| - the 65.18 objectives changed, so they will be 14.27 on 11.13
|
| - wait, 11.13, mmeaning the docs or a date?
|
| - I'll check and ping you
|
| Is this an early sign that AI is taking over?
| rpastuszak wrote:
| Sounds more like going back to the analog version of
| Zettelkasten, i.e. an actual slip-box.
|
| I'm getting chills just thinking how much overthinking/over-
| analysing following this approach would cost me. Because I'd
| over-engineer the hell out of this (just forget to take the
| actual notes). But then I remind myself that our brains can be
| so beautifully different and some people would find this system
| more productive.
|
| I'm quite happy with my poorly but regularly managed evergreen
| notes.
| 2big2fail_47 wrote:
| i've been using an adopted version since about 6month now and
| it's actually works great. I really like that every file now has
| a specific place. I even transferred the concept to my offline
| documents and folders.
| kelahcim wrote:
| I have found Sublime Text and Sublime Merge to server me well
| with sort of `GTD` like approach (not orthodox one though).
| Finding in files + `regexp` works perfectly fine for most of my
| notes related to projects. Sublime Merge serves as a history
| keeper.
| soldeace wrote:
| For the best part of my life I use a controlled set of tags[1]
| rather than hierarchical categories. This is mostly due to the
| fact that stuff can be a lot of things at the same time.
|
| That said, one of the best use cases for Johnny's system I've
| found is when you have to share an online drive with hundreds of
| people, where you can't use tags, and even if you could, there
| would be no consensus. Strangely, nowadays I can find my way
| around a huge project's online files quite easily just by the
| prefix numbers of each categorical level.
|
| [1] https://karl-voit.at/2022/01/29/How-to-Use-Tags/
| smallerdemon wrote:
| Is this the one that was speculated on as being a hoax / joke?
| theshrike79 wrote:
| Tagging + searching.
|
| I tried the organisation thing and it always falls down
| somewhere, when an item is supposed to be in multiple places at
| once.
|
| Especially now when Obsidian supports Properties on pages as
| first-class things combined with Omnisearch makes it a lot easier
| to just search for whatever I'm looking for.
|
| I still have crude categories like "projects", "recipes" and
| "RPG" with subfolders for each campaign, but that's about it.
| cygnion wrote:
| The approaches for information organization versus info retrieval
| are often different. For example, we mostly read content based on
| whatever fixed structure it's in, be it a blog post or a
| scientific article. Note-taking tends to follow that structure.
| But we retrieve and consume the information in a non-linear,
| context-driven search.
|
| Putting everything in a rigid hierarchical oder has some
| benefits, namely familiarityto the author, but incurs the cost of
| organizing the material and the mental context switch from the
| task at hand.
|
| I've been researching ways to make information tagging and visual
| search easy and effective - at the source - in the narrow case of
| reading scholarly documents [1]. The goal is to avoid a
| prescribed organization format in favor of contextual tagging,
| visualization for personal analytics, and linking of concepts
| afterwards, in a way that reduces distraction from reading and
| understanding.
|
| [1] https://www.knowledgegarden.io/
| kkfx wrote:
| IMVHO Johnny Decimal is a good system for paper archives in
| classic filing cabinets / furniture with many drawers, not much
| reasonable for computer-based systems.
|
| Beside that IME in PKM/PIM terms we need taxonomies just to
| manage storage, if it's not managed alone. In that case a time-
| based taxonomy like an year-based root, a month based II level or
| so it's enough for storage handling and eventual "time traveling"
| in noted contents. I have almost anything in org-mode/org-
| attached binaries, org-mode tangle-d configs and so on, I've
| tried various taxonomies and in the end just choosing to have
| subdirs of my org mode root dir one per year just to manage
| contents. Similarly this is how I split my Beancount-ed personal
| finance, one note per month.
|
| org-roam-node-find title-based search&narrow access + eventual
| ripgrep full-text-search are good enough to find anything. Rarely
| I've used org-ql but it's simply too long to keep consistent
| notes even with templates to use an SQL-alike querying.
| spookystats wrote:
| I've implemented this system for a while, but ultimately will
| switch to a less hierarchical one at some point in the future.
| Search, especially fuzzy search, is actually great! Using fzf in
| my terminal I can easily cd into any directory I desire as long
| as it or one of its parents has a sensible name.
| makeworld wrote:
| I tried this for a few months, just switched out of it yesterday.
| In contrast to what many people are saying here, my problem is it
| didn't have enough hierarchy. I wanted more levels of folders and
| that wasn't permitted by the system.
|
| Now I click more to find what I want, but it's cleaner and easier
| to find.
| c96aes wrote:
| I really love the crazy-people energy of this.
|
| There's no hesitation or equivocation, just THE ANSWER.
|
| Does it matter if it's a joke / hoax or not? I don't think so, I
| think it's beautiful either way.
| [deleted]
| smallerdemon wrote:
| Honestly, if you want to do this, just study and use the Library
| of Congress cataloging system:
| https://catalog.loc.gov/vwebv/ui/en_US/htdocs/help/numbers.h...
|
| The energy in creating your own catalog system to this extent
| while one already exists seems pretty ridiculous.
|
| Why not Dewey Decimal? Well... I mean... I guess you could. But
| most libraries have converted to LOC since it is fully supported
| and still receives attention and updates to categorizations.
| taink wrote:
| The article you linked to doesn't explain plainly how the
| cataloging system works. Going to Wikipedia[1] makes it a
| little more intelligible, but honestly it still sounds pretty
| complicated.
|
| If a cataloging system is difficult to explain in plain and
| simple terms, it will be tough to advocate for it with other
| people. I might put in the effort but how can I quickly
| convince others to do so?
|
| Johnny Decimal is pretty simple in comparison, it's 4 bullet
| points:
|
| - A JD number is two numbers with a dot in between: XX.YY
|
| - First digit of first number: a broad category
|
| - Second digit of first number: a sub-category
|
| - The second number is incremented each time you create a new
| folder (which you want to be careful about!)
|
| Once you've explained that and you show a basic folder
| structure, people will understand how it's used, at least.
| That's where it performs better than your system.
|
| It's probably fine if it's tough to define, but it has to be
| simple to teach.
|
| [1]
| https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...
| dredmorbius wrote:
| Organisation, classification, taxonomies, hierarchies, and
| catalogues are complex concepts.
|
| I've run across the Johnny Decimal concept a few times, and
| though it's interesting, and probably better than _no_
| organisation, I 'm not sold on it. That's strongly informed by a
| long deep dive into other classification and bibliographic
| schemes.
|
| One key distinction is between a _location-based classification
| system_ and an _indexing or tagging system_.
|
| A large bibliographic institution with an extensive history which
| maintains its own versions of _both_ concepts is the US Library
| of Congress.
|
| The Library of Congress _Classification_ is based on a scheme
| originally inherited from Thomas Jefferson for organising his own
| personal library whose donation to the US Congress inaugurated
| the Library of Congress. It is based on 21 top-level divisions,
| based on letters of the Roman alphabet from A-Z, excluding the
| letters I, O, W, & X). These are further broken into a vast
| number of further divisions, with the ultimate aim of
| _identifying an individual work_ AND _locating that work within
| the library 's physical archive_. Books, as physical objects,
| occupy a distinct space, and one space only, which cannot be
| shared with another book.
|
| The Library of Congress (LoC) _Subject Headings_ are a set of
| _descriptions_ of works. Unlike the _classification_ , a book may
| be assigned _multiple_ subject headings, corresponding to its
| major themes. These headings are not used to _define a book 's
| location_ but rather _aid in its discovery_ when searching by
| subject.
|
| Both of these classifications have evolved over well over a
| century, are the subject of ongoing efforts and development, and
| have had to address the fact of change in both the underlying
| subject matter (e.g., the Austro-Hungarian Empire, Soviet Union,
| and Colonial Africa are no longer extant) and how subject matters
| are addressed and referenced (psychology, sexuality, race,
| economics, scientific discoveries, and technological advances
| have all made old terms obsolete or deprecated, and introduced
| new ones).
|
| These are not the only classification and cataloguing schemes
| used by the US government, let alone other bibliography-
| manangement institutions elsewhere. The Superintendent of
| Documentation (SUDOC) system is used by the Government Printing
| Office to manage _its_ publications and archives, and is
| organised principally around _government offices_ (Cabinet
| divisions amongst the Executive, others in the Legislative and
| Judiciary), as well as date.
|
| Some libraries don't organise works by subject but instead assign
| a location code to works _independent of subject_ by which the
| works may be requested by patrons. (These are typically "closed-
| stack" systems.)
|
| There are other classifications: Dewey Decimal (superficially
| resembling the LoC classification), Decimal Classification, and
| Colon Classification are amongst the most widely used.
|
| _Digital storage has different requirements than physical._
| Documents needn 't have a single storage location or address.
| They may be part of numerous distinct systems, often with
| idiosyncratic or incompatible management or organisation
| themselves. Documents are far more likely to be _live_ , with
| ongoing development and adaptation as opposed to the static state
| of a print archive. "Documents" may in fact be data held in
| databases, wikis, or other systems, and be associated with
| multiple projects.
|
| One generally useful set of concepts for organising documents is
| the _Dublin Core Metadata Element Set_ (DCMES) with a set of 15
| defined elements: Contributor, Coverage, Creator, Date,
| Description, Format, Identifier, Language, Publisher, Relation,
| Rights, Source, Subject, Title, and Type. A document might have
| zero or more of each element associated with it (e.g., multiple
| authors, translators, illustrators, and /or editors as
| Contributors). And it's been observed that some elements are
| themselves ambiguous (e.g., what is the difference between a
| "contributor" and a "creator"). But _in general_ this at least
| provides a useful framework for considering works.
|
| <https://en.wikipedia.org/wiki/Dublin_Core>
|
| In looking at document management I've found a few other elements
| are useful to consider:
|
| - Project: the task being worked on.
|
| - Workflows: _what is being done_ with documents.
|
| - Document lifecycle: Often consisting of a set of stages:
| creation, edits/updates, acquisition, cataloguing, format
| conversions or transformations, transmission or publishing, and
| deacquisition or destruction.
|
| - Security and/or distribution scope. Keep in mind that _metadata
| itself_ has significant value and should be considered withing
| your data management / privacy / security policies.
|
| - Teams and participants. Both within and external to your
| organisation.
|
| - Organisational evolution. Personnel, departments, divisions,
| enterprises, and governments come, go, and change. The terms used
| now might not apply in a year, or 10, or 100. (Your document
| lifescycle might not extend to 10 or 100 years. Then again, it
| might.)
|
| Most usually, I'll find I use the following in organisation:
|
| - Date. Generally the start date of a project. Organising
| documents by time makes a great deal of sense, and creates
| natural search spans: day, week, month, quarter, year, decade,
| etc.
|
| - Author/owner. The person, department, or institution which
| created or "owns" the document. Within a sufficiently large
| organisation, these should live within a departmental /
| divisional structure.
|
| - Title. How the document is generally referred to. This can be
| startlingly ambiguous, especially internally. Third-party
| documents tend to have far more fixed and established titles,
| though even these can be tricky. E.g., "The White Album" is not
| actually titled "The White Album".
|
| - Other relevant distinguishing information.
|
| Where possible, I'll try to store information under a single
| filesystem directory hierarchy, though in practice things tend to
| live in multiple places: a shared code repository (public or
| private), a wiki or other platform, databases and application
| data. If possible these are tied together in some useful fashion,
| though any such organisation tends to be approximate and lagging
| reality at best.
|
| With a sufficiently large organisation, a librarian role who
| manages overall storage and resolves disputes is helpful. Though
| you'll be exceedingly lucky to have someone actually assigned to
| such a role.
|
| Johnny Decimal is interesting, and may well work for some cases.
| I doubt it meets all needs, and certainly isn't the One True Path
| for all individuals and organisations.
| igrekel wrote:
| Oh god every time someone start numbering sub-directories I want
| to scream. This feels soooooo like the old days. I hate it
| because then anything like ordering in alphabetical orders
| doesn't work. I'm looking for the documentation subdirectory, I
| know it's starts with "D" it's easy if I can order them in
| alphabetical order. But nooooo. Don't force me to remember
| decimal codes unless your name is Dewey and it's going to be the
| same thing every where, all the time.
|
| Also it's the same people who'll create a slew of empty sub
| directories so it's difficult to get a feel of what is the
| current state.TO be fair I think part of the problem is they try
| to apply that to everywhere at any level without logic. Number
| the directories for all customers... so now we assign the next
| number to the next customer we get, so to find a customer you
| need to known in which order they were added and it just makes no
| sense.. but people will usually support it because it "looks more
| organized".
| Linux-Fan wrote:
| I have started to use this system shortly after having first
| heard of it. I think the greatest challenge is that as I
| constantly create new things while only completing a fraction of
| them, I'd end up with a very "sparse" set of numbers being
| "active" in the end and unlike what is claimed by the system the
| relevance of directories for me is rarely linked to the time of
| their creation. Hence I needed to tweak this system a little to
| be useful to me and while it is a minor improvement over before I
| am still dissatisfied with the outcome...
|
| An alternative organization system which is more lightweight and
| could also work well -- I am not affiliated, but would like to
| try to add some of these ideas to my own structuring approach:
| https://plaintext-productivity.net/3-02-file-folder-structur...
|
| During the course of time the License of the JD system itself
| seems to have changed to a CC "NonCommercial" license. This
| really contradicts that this system is intended to be also useful
| in business contexts. I shall only look at my old copy where it
| is still under a DFGS-compliant license, but I wonder whether it
| is even possible to claim ownership on an organization system by
| means of a license? Wouldn't the only way to protect against it
| being used or described anywhere be a patent?
| korijn wrote:
| I think this doesn't solve the real problem: every person will
| categorize a document differently and look in a different place.
| That's the hard part. Creating categories and uniquely
| identifying them is easy.
| GuB-42 wrote:
| We use a similar system at work and yes, it is a problem.
|
| For example we have categories like "customer input",
| "contractual", "work documents" and "delivery". Some technical
| specifications could go in every place at once. It is an input
| from the customer, and it has contractual elements, but
| sometime, we have to update it, and make it part of the final
| delivery. These often end up all over the place.
| _1tan wrote:
| I don't think they're trying to solve this issue directly. This
| is intended to be used as PKM (personal knowledge management)
| system.
|
| You are correct in any case that this is a hard problem and I
| would love to get some perspective from the author on
| implementing such a system org-wide.
| logdahl wrote:
| Not sure I agree with your PKM point. It feels like it fits
| well for company/project structure. Reading his forum and a
| bit into the article it seems to advocate a "librarian" that
| keeps the system correctly categorized. I think he mentioned
| somewhere that this system doesn't fully fit PKM. I remeber
| some example of categorizing photos according after
| vacations, but running out of numbers. But I must admit I
| don't fully remeber the arguments.
| SanderNL wrote:
| IMO this kind of misses the point because now you have a fancy
| system to handle _your_ documents, but the rest of the company is
| using a million variations of just about every system you can
| think of including having folders named "temp" and "data2" and
| Karen will _not_ use your system no matter what you will say. I
| mention this, because the author mentioned "corporate system"
| and not "personal system". For personal management this is fine
| of course.
|
| Given the "systems" I have seen in the wild I think literally
| _anything_ that sort of standardizes information generation and
| retrieval is a massive improvement over random word documents on
| various NAS 'es, local files, powerpoints, emails and of course
| various sharepoint and custom CMS pages.
|
| I think many businesses can noticebly improve by dumping
| everything they have into .txt files instead of all the bs
| formats MS and friends came up with and just have a good search
| function. It looks like ass, but do you care about information or
| what.
|
| I'm sort of joking, of course, but not really..
| xwolfi wrote:
| I work in a giant place, 100k+ employees, 150+ years of
| documents, layers upon layers of processes, presence in all
| countries, heavy regulation, successive mergers.
|
| There is simply no way to make it work perfectly. We can do
| migration upon migration but that has never seemed to simplify
| anything (the joke in my team is as soon as something starts
| working, we migrate half of it away to a non working solution
| and end up with two half systems - this has proven true so many
| times we dont even take it as a joke anymore).
|
| We embrace the chaos, and focus on concrete client problems,
| and collect the fees for service well provided. No point in
| trying to make it make sense anymore or to micro optimize
| productivity. We have a rough estimate that it takes a year for
| an experienced new joiner to be truly productive, and we simply
| pay well so they don't leave before doing their first big
| contribution and basta.
|
| I work concurrently on two identical systems doing the same
| things on paper but with important difference in details,
| because the current political war made the american system win
| over the asian system, while knowing the asian system simply
| handles better our 13 market quirks than the american one that
| was made for a simple single market. We slowly complexify it to
| reach feature parity, deploy it in markets with low stakes
| while we improve the asian system continuously for the hundreds
| of high-paying clients. One day either we'll reach a point
| where a market that matters can finally be migrated, or the
| fact we make much profit in Asia will catch up with the global
| strategists and the asian system will start being proposed to
| solve (badly) the american problems.
|
| It has never been thought by our strategists to keep two
| systems working well in their respective core markets at the
| global scale. So we migrate half of Asia and keep two systems
| working badly at the local scale.
|
| I pity the intern joining in 1000 years though. We have no idea
| how the kraken will evolve after centuries...
| drcongo wrote:
| This gets posted here a lot [0], and I hate everything about it.
|
| [0] https://hn.algolia.com/?q=johnnydecimal.com
| Hbruz0 wrote:
| Care to elaborate ?
| drcongo wrote:
| I mean, there's a _lot_ to dislike, but it mostly boils down
| to this being a physical object system applied to digital
| objects, and I disagree with every single premise in the list
| of reasons they give for having created it in the first
| place. For instance, there's a whole section on how "Search
| doesn't help" which ends with "You can search for things, but
| the results are garbage." - the author hasn't considered that
| maybe search is garbage for them precisely because they're
| trying to apply physical object system to digital objects. I
| don't know about you, but Spotlight on my Mac finds exactly
| what I'm looking for every single time, because we use simple
| naming conventions.
|
| I also hate the absolute waste of human cognition this system
| causes making people try to memorise which number is at the
| start of a folder name they're looking for. From their own
| examples, a folder called `22 Contracts` is going to be
| sorted into a different position in a list than it would if
| it were just named `Contracts` - without the number, I both
| know exactly what the name of the folder I want is, and where
| it will appear in a directory listing. With someone else's
| choice of number at the front, I know neither of these
| things.
|
| Basically, every single thing this claims to solve is either
| not a problem in the first place, is trivially solved without
| making people learn a stranger's personal numbering
| preferences, or is made considerably worse by this system.
| CharlesW wrote:
| I don't mind content marketing with real value, but this is
| obvious "digital products" spam designed to get readers into a
| sales funnel for the workbook.
| smallerdemon wrote:
| I also hate it. My hatred stems from the fact that it's very
| much a tech bro ideology. They get it into their head that
| THEIR system is 'the one'. It's the same way that tech bro
| weirdos running startups get it in THEIR head that THEIR
| product is some sort of life changing piece of technology that
| every human on the planet is going to ultimately adopt once
| they just "understand it". And god forbid if the product does
| catch on, because then it's a horrific confirmation bias to
| them. So instead of luck and market coercion, the product's
| success is evidence of their brilliance.
|
| I know that seems like a weird conclusion to get to from this,
| but having read a LOT of these types of articles over the years
| and seeing people's convictions and deep seated beliefs about
| their categorization systems, it really underscores a lot of
| how humans get it in their heads about their POV of the world
| is the 'right one' that the rest of the world should get on
| board with.
| 0thgen wrote:
| I too hate seeing other people excited about things they are
| interested in
| bafe wrote:
| I agree with most of your sentiment. I also came to associate
| the proponents of these system with a a set of other traits:
| - obsession with tools and methods over outcomes - busywork -
| insistence with their approach being the only right one It
| seems these trait could be quite adjacent with tech bro
| behaviour
| baz00 wrote:
| I am convinced that absolutely no hierarchical taxonomy actually
| works in practice because every categorisation is really slightly
| fuzzy. Even libraries have problems categorising books into
| genres because some of them overlap considerably. I spent the
| last 25 years trying to build up useless hierarchies everywhere
| and none of them ever worked properly.
|
| Tagging/labelling with attributes is the only viable solution
| that I've found.
|
| This book is fiction/science-fiction/fantasy.
|
| This document is technical-reference/specification/ratified
|
| I believe this was the notion of the now defunct WinFS built on
| top of a database engine.
| couchand wrote:
| If anything, the history of libraries show you can get really
| really far by applying a sytematic organizing principle, even
| if it's WRONG(TM). Librarians may fret that some books are hard
| to categorize, but library patrons successfully find stacks of
| books on their own every day all around the world. It's not
| perfect, but it's better than nothing!
| dredmorbius wrote:
| Book classifications ultimately _need to assign a book to a
| location_.
|
| So long as _that_ goal is achieved, the _rationality_ or
| _consistency_ of that assignment is a strongly-secondary
| consideration. If the book has a place, and can be located,
| _where_ it happens to be relative to other related works, or
| unrelated ones, really doesn 't matter.
|
| _Searching for works based on similarities or relationships_
| is a problem of the _indexing_ system. I should be able to
| find works on a similar topic, which reference or are
| referenced by another, by author, by institution, perhaps by
| publisher, by publication date, etc. That information would
| be carried within a catalogue, but need not determine the
| location of the work within a physical archive.
| jjtheblunt wrote:
| I've thought a ton about what you just described, and I think
| of it this way: not every directed acyclic graph happens to be
| a tree.
| oniony wrote:
| And what led me to build [TMSU](https://tmsu.org/).
| vifon wrote:
| TMSU is the only such system I found useful without being
| cumbersome. After years of trying to use Git Annex, it was
| refreshing that TMSU doesn't alter the files in any way,
| merely storing all the (meta)data out-of-band in a separate
| DB.
|
| These days I use TMSU via my own Emacs-based UI almost every
| single day, so thank you for that!
| dredmorbius wrote:
| Storing metadata out-of-band strikes me as key to any
| _usable_ content management system within a realistically
| complex space.
|
| Naming schemes and directory hierarchies have some
| _limited_ application, but ultimately there will be data
| which simply won 't be shoehorned into any such system, and
| an externally-managed catalogue tying together disparate
| elements is required.
|
| (Keeping that catalogue up to date and consistent is a
| whole 'nother issue.)
|
| I _do_ like the idea of a virtual filesystem in which
| elements are effectively search dimensions, which leads to
| an interesting notion that _search is identity_.
|
| That is, a search will produce one of three possible result
| sets:
|
| - Null, that is, no matches.
|
| - Plural, that is, a list of matches.
|
| - Unity, that is, _one_ matching item.
|
| In the last case, _the search providing a single result is
| an identity of that result_. (It may not be a stable
| identity over time, but it is _at least for the present._ )
|
| Where a list is returned, _the size of the list determines
| how usable it is, and how it is usable._ Ten items can be
| quickly scanned to find the relevant item(s), if they
| exist. 100 or 1,000 items can often still be managed
| manually, though they 'll typically take some time.
| Somewhere between 100 and a few thousand items, though,
| you're in the range where automated assessments or
| filtering becomes necessary.
|
| Large libraries themselves typically have tens of thousands
| to millions of items. The largest book collections (Library
| of Congress, British Library) have roughly 150 million
| books (or equivalents). Other records may exist in greater
| numbers: periodicals, financial records, databases.
| Facebook has reported ~5 billion items posted _daily_ for
| some years now. (I suspect most of those are trivial, but
| that still leaves a large number of potentially non-trivial
| items.) Surveillance and other large-scale data collection
| systems may be larger still.
| vifon wrote:
| > Storing metadata out-of-band strikes me as key to any
| usable content management system within a realistically
| complex space.
|
| Yes and no. I was specifically comparing it to Git Annex
| which is hard to categorize in these terms. It forces
| every file to become a symbolic link to the actual file
| living in `.git/annex/` and then every query temporarily
| mutating the hierarchy of directories storing these
| symbolic links. I found the latter disruptive enough (in
| particular for the directory mtimes) that I was actively
| avoiding doing any such queries. See: https://git-
| annex.branchable.com/tips/metadata_driven_views/
|
| On the other hand my current setup involves TMSU queries
| which result in virtual Emacs directory editor (dired)
| views that don't affect anything else. I don't even use
| the FUSE functionality of TMSU.
| dvas wrote:
| Thank you for sharing!
|
| Exactly what I needed for my own note storing system where I
| can tag and describe various pdf's like books, lecture notes,
| and papers with tags, then use `rga` for searching inside of
| the pdf's for content!
| Cyphase wrote:
| Thank you!
| ThouYS wrote:
| this, tags + dynamic views on those tags
| logdahl wrote:
| I'm just wondering when we're gonna get tagged filesystems.
| Maybe something partially hierarchical. Would love a book/
| directory that i can query for 'sci-fi' or 'specification'. I
| guess one could implement it using sym-links :^)
| dredmorbius wrote:
| The approach I've been favouring for a while has been to have
| a data store, I call it the "stacks", based off physical
| library nomenclature, and then a virtual filesystem which
| effectively uses searches along various attributes to surface
| individual items.
|
| Those searches might be based on title,
| author/creator/contributor, publication/creation dates,
| assigned identifiers, full-text search, relations between
| documents, subjects or topics, document type / format, or
| other aspects.
|
| _How_ that is specified precisely ... I haven 't settled on
| entirely, though short mnemonics, two- or three-letter where
| possible (ti -> title, au -> author, date, isbn, oclc, doi,
| etc.) might work.
|
| If the filesystem lives under /docfs, then, say,
| /docfs/au:barrie/ti:peter+pan would turn up J.M. Barrie's
| _Peter Pan_. Specific document identifiers such as ISBNs,
| OCLCs, DOIs, or LCCSs could pinpoint specific documents,
| looser searches such as for _Peter Pan_ might generate a
| _list_ rather than a _unitary response_ but those individual
| documents would be further distinguished by other properties.
|
| The filesystem would be _virtual_ rather than static-on-disk.
| You 're effectively exploring the namespace. Creating this
| _as a filesystem_ rather than, say, as a database query or
| application-specific data store means that _standard
| filesystem tools_ would be available to interact with the
| contents, though _modifications_ of works might require some
| additional work.
|
| The system could also provide access to works not directly in
| the stacks, including remote resources (e.g., accessed via
| URIs and network calls), or applications (through some API).
| The underlying data store could be in any of several formats,
| and again the filesystem-based access could abstract between
| several independent stores.
|
| Now all I need to do is build it ...
| Cyphase wrote:
| This sounds at least somewhat like TMSU, which has been
| mentioned in some other comments here.
|
| https://tmsu.org/
| baz00 wrote:
| Don't even go there with the symlinks. My mother made a
| family tree on windows with folders and shortcuts. It broke
| OneDrive and robocopy completely. I don't even know how to
| help her fix it.
| virtue3 wrote:
| It's 3 in the morning and I couldn't stop laughing out loud
| when I read this. Poor OneDrive lol.
|
| I hope she managed to keep all her files and didn't lose
| anything.
| baz00 wrote:
| It still works and she hasn't lost anything but you can't
| actually move or delete any files any more because
| windows explorer craps out too!
|
| I literally have no idea what to do with it :(
| hobs wrote:
| My assumption is she ran into the max path length
| limitation, here's how to fix it
| https://learn.microsoft.com/en-
| us/windows/win32/fileio/maxim...
| baz00 wrote:
| That was one of the problems. Unfortunately after fixing
| that it's still broken.
| mceachen wrote:
| You could write a short PowerShell script that recursively
| deletes shortcuts and symlinks, but I'd certainly take a
| full backup and have it offline before I kicked out off.
| alpaca128 wrote:
| On one hand this is an edge case, on the other things like
| this make me wonder how computers can even work if first-
| party software built specifically for nothing but managing
| files fails to gracefully handle a folder structure a user
| created by hand.
| baz00 wrote:
| That was exactly my take home from this. Windows has some
| very serious flaws in it when it comes to data integrity.
| I do not trust it and do not use it myself.
| theshrike79 wrote:
| There are some attempts using Fuse on Linux.
|
| At least one I tried ages ago was one you could use to browse
| your music collection, it automatically generated
| "directories" based on mp3tags.
| Brajeshwar wrote:
| Apple tried doing this with "tag" in the file system but most
| people don't use it.
| hnbad wrote:
| Microsoft allegedly wanted to get rid of the folder
| hiearchy completely. There were a lot of rumors surrounding
| "Windows 97" (what later became Windows 98) because of its
| Internet integration (which eventually just turned into a
| lot of dead-end MSN widgets and Internet Explorer being
| shipped with the OS). I don't know how well-founded this
| speculation was but allegedly Microsoft wanted to fully
| commit to things like "cloud storage" (before this was a
| thing, so of course not using this name) and ended up
| shelving most of this work because the networks just
| weren't fast enough at the time.
|
| Note: I specifically remember these speculations in the
| lead-up to the release of Windows 98. They don't seem to be
| easy to google at this point.
| baz00 wrote:
| I use it! :)
| smallerdemon wrote:
| Same.
| Brajeshwar wrote:
| Don't Do That. Don't Give Me Hope.
| timschmidt wrote:
| May 10, 1997: https://en.wikipedia.org/wiki/Be_File_System
| Cyphase wrote:
| The author of TMSU left a sibling comment to yours:
| https://news.ycombinator.com/item?id=37507343
|
| https://tmsu.org/
|
| > TMSU is a tool for tagging your files. It provides a simple
| command-line tool for applying tags and a virtual filesystem
| so that you can get a tag-based view of your files from
| within any other program.
|
| > TMSU does not alter your files in any way: they remain
| unchanged on disk, or on the network, wherever you put them.
| TMSU maintains its own database and you simply gain an
| additional view, which you can mount, based upon the tags you
| set up. The only commitment required is your time and there's
| absolutely no lock-in.
| notfed wrote:
| Got me wondering how it tracks file moves. From the FAQ:
|
| > TMSU [does] not automatically detect file moves and
| renames
|
| I dunno if we can call it a tagged filesystem with this
| limitation. (Why not store tags, or at least a GUID, in
| file metadata?)
| istjohn wrote:
| It does store file hashes. It has a repair command that
| will use file hashes to find moved files. I don't know
| how well it works.
| dotancohen wrote:
| This seems a serious limitation. Bash is my file manager
| - ls and cd. I would not give that up for any tool no
| matter how convenient.
|
| But presumably there is some API or ABI to monitor, or
| some signal emitted, when files are moved, added,
| updated, or deleted. However iwaitnotify works, could
| that be incorporated (even theoretically) into TMSU?
| Cyphase wrote:
| https://github.com/oniony/TMSU/wiki/FAQ#why-does-tmsu-
| not-au...
|
| There are a couple very barebones wrappers around mv and
| rm, though they could be better (pass through arguments,
| etc.).
|
| https://github.com/oniony/TMSU/wiki/Tricks-and-
| Tips#filesyst...
| pedrovhb wrote:
| I just wish xattrs were more resilient and explicit.
| mihaaly wrote:
| Also if someone by some miracle is able to come up with a
| univocal hierarchy for a usecase or organization - that may
| even work throughout time and evolving situation and conditions
| - then collaborating with an other field or organization will
| have collisions or discrepancies in taxonomy and a hopeless
| confusion of category number mapping.
| PurpleRamen wrote:
| > I am convinced that absolutely no hierarchical taxonomy
| actually works in practice because every categorisation is
| really slightly fuzzy.
|
| Fuzzy is ok, not everything needs to be perfect.
|
| > Tagging/labelling with attributes is the only viable solution
| that I've found.
|
| Categories and tags have different use cases. Categories are
| the best solution, when there is only one possible location for
| an object, like with physical objects, which cannot appear at
| multiple locations at the same time. Tags make sense, if you
| can offer multiple entry points to reach the goal, and have the
| option to manage them all on a reliable level.
|
| And nothing speaks against using them both, with different
| valuesets. Like with digital files, you can have categories for
| managing the filetypes (books, videos, etc.), but then use tags
| to manage the content in each category.
|
| > This book is fiction/science-fiction/fantasy.
|
| Isn't this kinda redundant? science-fiction is already fiction,
| it's in the name. Reads more like a hierarchy. Which you of
| course you can also with tags. Any fantasy-tag automatically
| inherits the fiction-tag too.
| giaour wrote:
| > Isn't this kinda redundant? science-fiction is already
| fiction, it's in the name.
|
| This is nit-picking the specific example rather than engaging
| with the GP's point. There are definitely genres that span
| the fiction/non-fiction divide (like romance, horror,
| politics, judaica, and many more). Interestingly,
| hierarchical systems used by libraries tend to ignore genre.
| The Dewey Decimal system doesn't classify literature by genre
| but instead by language, though some overarching genres
| (drama vs poetry) are treated as subcategories of language
| (e.g., German poetry is 831, French poetry is 841), but only
| for major European languages.
| PurpleRamen wrote:
| > This is nit-picking the specific example rather than
| engaging with the GP's point.
|
| It's showing that tags can be fuzzy too. So the initial
| complaint is not avoided, just moved.
| [deleted]
| eek04_ wrote:
| Scientific libraries categorise into *several categories*.
|
| They effectively use hierarchical tags, with the set of tags
| very carefully curated.
| pbmonster wrote:
| > Tagging/labelling with attributes is the only viable solution
| that I've found.
|
| What's your solution for a new tag being introduced late, which
| could/should be retroactively applied to already tagged media?
|
| Re-tag the entire inventory by hand? Re-tag automatically with
| some ML/AI tool after you've tagged enough media to train a
| discriminator? Or just live with old inventory not having that
| tag?
| dotancohen wrote:
| With hierarchical tags this is less of an issue. Presumably
| the old tags are still valid for the extant documents? Can
| the new tag be asub-tag of one of the extant tags?
| giraffe_lady wrote:
| More likely to discover a supertag than a subtag in my
| experience. I think this is why so many people get value
| out of zettelkasten: focuses on connections rather than
| categories.
| dotancohen wrote:
| Ideally both the supertag and subtag would return the
| document tagged with the subtag.
| dredmorbius wrote:
| In the case of TMSU, you have a shell / command-line tool, so
| that anything which generates a listing of files can be fed
| in to assign additional tags.
|
| That might be based on find, grep, other pattern-matching
| tools, tmsu itself, or manually curating a file or files of
| content you want to modify.
|
| One of the reasons I strongly prefer CLI tools in general is
| that they make precisely this sort of bulk action so viable.
| I've modified anywhere from tens to hundreds of thousands of
| items with a small set of commands, and in most cases many
| millions or more should be reasonably doable.
| PurpleRamen wrote:
| Yes, usually, you must recheck your content regularly and
| check if old content validates for new tags, or if old tags
| have shifted in meaning, splitted into new subtags, etc. But
| this does not necessarily mean you should check everything,
| and regularly must not mean every week or month. Just review
| small parts of your library for changes, occasionally as you
| have time, the improvements will come over time by
| themselves.
|
| A never changing library, is a dead library.
| alpaca128 wrote:
| It can mean a lot of effort, yes, but in a lot of cases you
| can drastically reduce the number of items to re-tag by
| defining some simple relations between tags. For example tag
| A may require tag B to exist but not C, etc.
|
| In the end maintaining order always requires effort, if you
| have a hierarchical folder structure and you decide to add
| another category you have the same problem. And realistically
| folders are "good enough" for most situations, you can choose
| for each file how fine-grained the categorization should be
| and the mental overhead for finding a file is often not that
| bad.
| baz00 wrote:
| Depends very much on the problem domain. It's really easy to
| do this on macOS.
|
| So for example I have a top level `Documents/docs` directory.
| In this everything is recorded in format `yyyy-mm-dd
| Title.ext` format. I have a saved search which says "give me
| everything untagged". I will select each thing that is
| untagged and then tag it!
| plagiarist wrote:
| I think they're asking something different, what if you
| think of a brand new tag after you have already tagged
| things for years? It is potentially a lot of effort to
| retroactively apply it.
| dvas wrote:
| I ran into a similar situation a while back now,
|
| I have my version of a zettelkasten system, where I put a
| tag/ list of tags at the top of the note file (timestamp
| being the note name). Combining this with something like
| ripgrep allows me to search the note file contents by
| those tags, and keywords.
|
| A tag I used, didn't seem like the right one for the
| content as it matured and grew over time.
|
| This meant creating a tiny script which would take the
| name of the tag and the note name, in which it is being
| used and look at backlinks and forward links and rename
| the tag based on usage.
| smallerdemon wrote:
| Yeah. All systems like this have to account for ad-hoc
| additions. Otherwise you will often jam a few squares
| into some circles just to accommodate your categorization
| system.
| cma wrote:
| AI/LLMs can probably make it easy.
| somat wrote:
| tagging is what happens when you allow an artifact to exist
| under more than one hierarchy at the same time.
|
| The progression usually goes.
|
| "we want to label our things."
|
| "There is a natural hierarchy that we want to be reflected in
| our labels.
|
| "Some things are not labeling well. instead use multiple labels
| per thing."
|
| "There is a natural hierarchy that we want to be reflected in
| our labels."
|
| The unix files system is a good example of the latter. it
| favors the hierarchy but allows each file to have many names if
| that is what is required. The main weakness of the unix
| filesystem is that there is no reverse map, that is, finding a
| file from its name(s) is simple. finding the names from the
| file involve searching the whole filesystem.
| imrehg wrote:
| That's what I took away from reading Everything is
| Miscellaneous a long while back
| https://en.wikipedia.org/wiki/Everything_Is_Miscellaneous
|
| Definitely thinking about categorization differently (folders
| which are similar to this, tags, organic order by linking;
| bookmarks, notes, all are in the same boat as well).
| lukasgraf wrote:
| I saw David Weinberger's talk at Google [1] some 15 years
| ago, where he presented the ideas from his book (and bought
| the book afterwards). This is one of those videos I rewatch
| regularly, because the ideas hold up so well, even today.
|
| His critique of the Dewey Decimal system was the first thing
| that came to mind when I read Johnny Decimal. And indeed, it
| is flawed the same way.
|
| [1] https://www.youtube.com/watch?v=x3wOhXsjPYM
| rpastuszak wrote:
| This is precisely the reason why smaller cross-linked notes and
| tags work quite well for me, and JD or almost anything else in
| the past hasn't.
|
| The ideal for me involves hypertext + (local, private) AI-
| assisted search.
|
| None of this will replace reviewing/pruning your notes
| regularly (think: Andy Matuschak's fleeting and evergreen
| notes), but it minimises the friction.
| dotancohen wrote:
| I actually use a hierarchy of tags to organize my notes. The
| _tags_ are hierarchial, not the storage path.
| somat wrote:
| but your storage path is already a hierarchy of tags.*
|
| *If using a unix derived filesystem.
| dotancohen wrote:
| I could use symlinks to refer to files, yes, but rather I'm
| using nonstandard #foo-bar-baz subtag syntax to refer to
| sections of org mode documents. I do not use real org mode
| tags because those are restricted to headlines.
|
| In other news, if anyone knows how to restrict a search in
| an org mode buffer to search only the headings, that would
| be very helpful! Right now I'm using an external grep
| wrapper, restricting to lines that begin with an asterisk.
| neogodless wrote:
| My email folder system was inspired by this and loosely follows
| it. Works well enough for email (personal and professional). I
| have not tended to find it as useful for actual file folders.
|
| https://imgur.com/a/K2mN8cY
|
| If nothing else, it's because I'm often working with yet another
| new set of folders and files, and they are very different from
| the last, and I haven't developed some sort of intuitive
| hierarchy that I could apply. With email, it's pretty consistent.
| The only changes are new project/client folders popping up, and
| that's typically pretty infrequent, even with agency work.
| Hbruz0 wrote:
| I've been trying it since last time it came up here, a few weeks
| ago.
|
| I have trouble remembering the nn.mm codes, but found it useful
| for organising backups on a remote machine.
|
| For some files, it's obviously difficult to choose one category
| but it's not that big of an issue when compared to organising,
| say, git repos, or folders with a lot of files. Could go higher
| into the hierarchy but the folder is too specific for that.
|
| If anyone has faced or solved these issues, I'd love to hear
| about it
| chipsa wrote:
| Don't remember the AC.ID codes. Have a script generate a
| textfile index based on the file structure. Then you just look
| at the index. You look into a place enough, and you'll remember
| it eventually then.
| lifthrasiir wrote:
| Previously: https://news.ycombinator.com/item?id=25398027
| (2020-12-13, where the site author complains about HN ;-),
| https://news.ycombinator.com/item?id=36308366 (2023-06-13)
|
| I do believe that such a system has a value, but only when _built
| by users themselves_. Sure, there would be some commonalities and
| schemes like Johnny Decimal may be a good guideline; prefixes
| would work well because they get sorted without any additional
| jobs (but doesn 't scale for many subjects), tags are fine when
| you have a software support, you can also use metadata like dates
| and backlinks, and so on. But making a proper organization takes
| time and _observation_ , and only long-time users with an urgent
| need have that knowledge. Especially because no organization
| system can remain unchagned over time.
| otterpro wrote:
| I use Johnny decimal because it allowed me to offload my decision
| for file organization. I had my doubt, but when I tried it as a
| test, I felt a sense of relief. Before, my file system was a
| mess, and no file search or tagging could save me from naming
| hell. Naming things are hard and took a lot of mental energy. I
| see a lot of cynicism in the comments but having an imperfect and
| flawed system is better than having no system at all.
| istjohn wrote:
| LLMs will soon make file organization something we never have to
| think about.
___________________________________________________________________
(page generated 2023-09-14 23:03 UTC)